Genes and proteins involved in the biosynthesis of enediyne ring structures

ABSTRACT

Five protein families cooperate to form the warhead structure that characterizes enediyne compounds, both chromoprotein enediynes and non-chromoprotein enediynes. The protein families include a polyketide synthase and thioesterase protein which form a polyketide synthase catalytic complex involved in warhead formation in enediynes. Genes encoding a member of each of the five protein families are found in all enediyne biosynthetic loci. The genes and proteins may be used in genetic engineering applications to design new enediyne compounds and in methods to identify new enediyne biosynthetic loci.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit under 35 USC § 119 of provisionalapplications U.S. Ser. No. 60/291,959 filed on May 21, 2001 and U.S.Ser. No. 60/334,604 filed on Dec. 3, 2001 which are hereby incorporatedby reference in their entirety for all purposes.

FIELD OF INVENTION

The present invention relates to the field of microbiology, and morespecifically to genes and proteins involved in the production ofenediynes.

BACKGROUND

Enediyne natural products are characterized by the presence of theenediyne ring structure also referred to as the warhead. The labileenediyne ring structure undergoes a thermodynamically favorable Bergmancyclization resulting in transient formation of a biradical species. Thebiradical species is capable of inducing irreversible DNA damage in thecell. This reactivity gives rise to potential biological activityagainst both bacterial and tumor cell lines. Enediynes have potential asanticancer agents because of their ability to cleave DNA. Calicheamicinis currently in clinical trials as an anticancer agent for acute myeloidleukemia (Nabhan C. and Tallman M S, Clin Lymphoma (2002) March; 2 Suppl1: S19-23). Enediynes also have utility as anti-infective agents.Accordingly, processes for improving production of existing enediynes orproducing novel modified enediynes are of great interest to thepharmaceutical industry.

Enediynes are a structurally diverse group of compounds. Chromoproteinenediynes refer to enediynes associated with a protein conferringstability to the complex under physiological conditions.Non-chromoprotein enediynes refer to enediynes that require noadditional stabilization factors. The structure of the chromoproteinenediynes neocarzinostatin and C-1027, and the non-chromoproteinenediynes calicheamicin and dynemicin are shown below with thedodecapolyene backbone forming the warhead structure in each enediynehighlighted in bold.

Efforts at discovering the genes responsible for synthesis of thewarhead structure that characterizes enediynes have been unsuccessful.Genes encoding biosynthetic enzymes for the aryltetrasaccharide ofcalicheamicin, and for calicheamicin resistance are described in WO00/37608. Additional genes involved in the biosynthesis of thechromoprotein enediyne C-1027 have been isolated (Liu, et al.Antimicrobial Agents and Chemotherapy, vol. 44, pp 382-292 (2000); WO00/40596). Isotopic incorporation experiments have indicated that theenediyne backbones of esperamicin, dynemycin, and neocarzinostatin areacetate derived (Hansens, O. D. et al. J. Am. Chem Soc. 11, vol 111 pp.3295-3299 (1989); Lam, K. et al. J. Am. Chem. Soc. vol. 115, pp12340-12345 (1993); Tokiwa, Y et al. J. Am. Chem Soc. vol. 113 pp.4107-4110). However, both PCR and DNA probes homologous to type I andtype II PKSs have failed to identify the presence of PKS genesassociated with biosynthesis of enediynes in known enediyne producingmicroorganisms (WO 00/40596; W. Liu & B. Shen, Antimicrobial AgentsChemotherapy, vol. 44 No. 2 pp. 382-392 (2000)).

Elucidation of the genes involved in biosynthesis of enediynes,particularly the warhead structure, would provide access to rationalengineering of enediyne biosynthesis for novel drug leads and makes itpossible to construct overproducing strains by de-regulating thebiosynthetic machinery. Elucidation of PKS genes involved in thebiosynthesis of enediynes would contribute to the field of combinatorialbiosynthesis by expanding the repertoire of PKS genes available formaking novel enediynes via combinatorial biosynthesis.

Existing screening methods for identifying enediyne-producing microbesare laborious, time-consuming and have not provided sufficientdiscrimination to date to detect organisms producing enediyne naturalproducts at low levels. There is a need for improved tools to detectenediyne-producing organisms. There is also a need for tools capable ofdetecting organisms that produce enediynes at levels that are notdetected by traditional culture tests.

SUMMARY OF THE INVENTION

One embodiment of the present invention is an isolated, purified orenriched nucleic acid comprising a sequence selected from the groupconsisting of: (a) SEQ ID NOS: 2, 14, 24, 34, 44, 54, 64, 74, 84, 94;sequences complementary to SEQ ID NOS: 2, 14, 24, 34, 44, 54, 64, 74,84, 94; fragments comprising 2000, preferably 3000, more preferably4000, still more preferably 5000, still more preferably 5600 and mostpreferably 5750 consecutive nucleotides of SEQ ID NOS: 2, 14, 24, 34,44, 54, 64, 74, 84, 94; and fragments comprising 2000, preferably 3000,more preferably 4000, still more preferably 5000, still more preferably5600 and most preferably 5750 consecutive nucleotides of the sequencescomplementary to SEQ ID NOS: 2, 14, 24, 34, 44, 54, 64, 74, 84, 94; (b)SEQ ID NOS: 4, 6, 16, 26, 36, 46, 56, 66, 76, 86, 96; sequencescomplementary to SEQ ID NOS: 4, 6, 16, 26, 36, 46, 56, 66, 76, 86, 96;fragments comprising 150, preferably 200, more preferably 250, stillmore preferably 300, still more preferably 350 and most preferably 400consecutive nucleotides of the sequences complementary to SEQ ID NOS: 4,6, 16, 26, 36, 46, 56, 66, 76, 86, 96; and fragments comprising 150,preferably 200, more preferably 250, still more preferably 300, stillmore preferably 350 and most preferably 400 consecutive nucleotides ofthe sequences complementary to SEQ ID NOS: 4, 6, 16, 26, 36, 46, 56, 66,76, 86, 96; (c) SEQ ID NOS: 8, 18, 28, 38, 48, 58, 68, 78, 88, 98;sequences complementary to SEQ ID NOS: 8, 18, 28, 38, 48, 58, 68, 78,88, 98; fragments comprising 700, preferably 750, more preferably 800,still more preferably 850, still more preferably 900 and most preferably950 consecutive nucleotides of SEQ ID NOS: 8, 18, 28, 38, 48, 58, 68,78, 88, 98; and fragments comprising 700, preferably 750, morepreferably 800, still more preferably 850, still more preferably 900 andmost preferably 950 consecutive nucleotides of the sequencescomplementary to SEQ ID NOS: 8, 18, 28, 38, 48, 58, 68, 78, 88, 98; (d)SEQ ID NOS: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100; sequencescomplementary to SEQ ID NOS: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100;fragments comprising 600, preferably 700, more preferably 750, stillmore preferably 800, still more preferably 850 and most preferably 900consecutive nucleotides of SEQ ID NOS: 10, 20, 30, 40, 50, 60, 70, 80,90, 100; and fragments comprising 600, preferably 700, more preferably750, still more preferably 800, still more preferably 850 and mostpreferably 900 consecutive nucleotides of SEQ ID NOS: 10, 20, 30, 40,50, 60, 70, 80, 90, 100; and (e) SEQ ID NOS: 12, 22, 32, 42, 52, 62, 72,82, 92, 102; sequences complementary to SEQ ID NOS: 12, 22, 32, 42, 52,62, 72, 82, 92, 102; fragments comprising 700, preferably 750, morepreferably 800, still more preferably 850, still more preferably 900 andmost preferably 950 consecutive nucleotides of the sequencescomplementary to SEQ ID NOS: 12, 22, 32, 42, 52, 62, 72, 82, 92, 102;and fragments comprising 700, preferably 750, more preferably 800, stillmore preferably 850, still more preferably 900 and most preferably 950consecutive nucleotides of SEQ ID NOS: 12, 22, 32, 42, 52, 62, 72, 82,92, 102. One aspect of the present invention is an isolated, purified orenriched nucleic acid capable of hybridizing to the nucleic acid of thisembodiment under conditions of high stringency. Another aspect of thepresent invention is an isolated, purified or enriched nucleic acidcapable of hybridizing to the nucleic acid of this embodiment underconditions of moderate stringency. Another aspect of the presentinvention is an isolated, purified or enriched nucleic acid capable ofhybridizing to the nucleic acid of this embodiment under low stringency.Another aspect of the present invention is an isolated, purified orenriched nucleic acid having at least 70% homology to the nucleic acidof this embodiment by analysis with BLASTN version 2.0 with the defaultparameters. Another aspect of the present invention is an isolated,purified or enriched nucleic acid having at least 99% homology to thenucleic acid of this embodiment as determined by analysis with BLASTNversion 2.0 with the default parameters.

Another embodiment is an isolated, purified or enriched nucleic acidthat encodes an enediyne polyketide synthase protein comprising apolypeptide selected from the group consisting of: (a) SEQ ID NOS: 1,13, 23, 33, 43, 53, 63, 73, 83, 93; (b) polypeptides having at least 75%homology to a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73,83, 93 as determined using the BLASTP algorithm with the defaultparameters and having the ability to substitute for a polypeptide of SEQID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83 or 93 during synthesis awarhead structure in an enediyne compound; and (c) fragments of thepolypeptides of (a) and (b), which fragments have the ability tosubstitute for a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63,73, 83, 93 in the synthesis of the warhead structure in an enediynecompound. In one aspect of this embodiment, the nucleic acid encoding anenediyne polyketide synthase protein may be used in genetic engineeringapplications to synthesize the warhead structure of an enediynecompound.

Another embodiment is an isolated, purified or enriched nucleic acidthat encodes an enediyne polyketide synthase catalytic complexcomprising (a) a polypeptide selected from the group consisting of SEQID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93; polypeptides having atleast 75% homology to a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43,53, 63, 73, 83, 93 as determined using the BLASTP algorithm with thedefault parameters and having the ability to substitute for apolypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83 or 93during synthesis a warhead structure in an enediyne compound; andfragments thereof, which fragments have the ability to substitute for apolypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 in thesynthesis of the warhead structure in an enediyne compound; and (b) apolypeptide selected from the group consisting of SEQ ID NOS: 3, 5, 15,25, 35, 45, 55, 65, 75, 85, 95; polypeptides having at least 75%homology to a polypeptide of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65,75, 85, 95 as determined using the BLASTP algorithm with the defaultparameters and having the ability to substitute for a polypeptide of SEQID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95 during synthesis of awarhead structure in an enediyne compound; and fragments thereof, whichfragments have the ability to substitute for a polypeptide of SEQ IDNOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95 in the synthesis of thewarhead structure in an enediyne compound. In one aspect of thisembodiment, the nucleic acid encoding an enediyne polyketide synthasecatalytic complex may be used in genetic engineering application tosynthesize the warhead structure of an enediyne compound.

Another embodiment is an isolated, purified or enriched nucleic acidencoding a gene cassette comprising: (a) a nucleic acid encoding anenediyne polyketide synthase catalytic complex as described above; and(b) at least one nucleic acid encoding a polypeptide selected from thegroup consisting of (i) SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87,97; polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97 as determined using theBLASTP algorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67,77, 87, 97 during synthesis of a warhead structure in an enediynecompound; and fragments thereof, which fragments have the ability tosubstitute for a polypeptide of SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67,77, 87, 97 in the synthesis of the warhead structure in an enediynecompound; (ii) SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99;polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99 as determined using theBLASTP algorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69,79, 89, 99 during synthesis of a warhead structure in an enediynecompound; and fragments thereof, which fragments have the ability tosubstitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69,79, 89, 99 in the synthesis of the warhead structure in an enediynecompound; and (iii) SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101;polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101 as determined using theBLASTP algorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71,81, 91, 101 during synthesis of a warhead structure in an enediynecompound; and fragments thereof, which fragments have the ability tosubstitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71,81, 91, 101 in the synthesis of the warhead structure in an enediynecompound. In one aspect of this embodiment, the nucleic acid encodingthe gene cassette may be used in genetic engineering application tosynthesize the warhead structure of an enediyne compound.

Another embodiment is an isolated, purified or enriched nucleic acidencoding a gene cassette comprising: (a) a nucleic acid encoding apolypeptide selected from the group consisting of SEQ ID NOS: 1, 13, 23,33, 43, 53, 63, 73, 83, 93; a polypeptide having at least 75% homologyto a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 asdetermined using the BLASTP algorithm with the default parameters andhaving the ability to substitute for a polypeptide of SEQ ID NOS: 1, 13,23, 33, 43, 53, 63, 73, 83 or 93 during synthesis a warhead structure inan enediyne compound; or a fragment thereof, which fragment has theability to substitute for a polypeptide of SEQ ID NOS: 1, 13, 23, 33,43, 53, 63, 73, 83, 93 in the synthesis of the warhead structure in anenediyne compound; (b) at least one nucleic acid encoding a polypeptideselected from the group consisting of SEQ ID NOS: 3, 5, 15, 25, 35, 45,55, 65, 75, 85, 95; a polypeptide having at least 75% homology to apolypeptide of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95 asdetermined using the BLASTP algorithm with the default parameters andhaving the ability to substitute for a polypeptide of SEQ ID NOS: 3, 5,15, 25, 35, 45, 55, 65, 75, 85, 95 during synthesis of a warheadstructure in an enediyne compound; or a fragment thereof, which fragmenthas the ability to substitute for a polypeptide of SEQ ID NOS: 3, 5, 15,25, 35, 45, 55, 65, 75, 85, 95 in the synthesis of the warhead structurein an enediyne compound; (c) at least one nucleic acid encoding apolypeptide selected from the group consisting of SEQ ID NOS: 7, 17, 27,37, 47, 57, 67, 77, 87, 97; a polypeptide having at least 75% homologyto a polypeptide of SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97 asdetermined using the BLASTP algorithm with the default parameters andhaving the ability to substitute for a polypeptide of SEQ ID NOS: 7, 17,27, 37, 47, 57, 67, 77, 87, 97 during synthesis of a warhead structurein an enediyne compound; and a fragment thereof, which fragment has theability to substitute for a polypeptide of SEQ ID NOS: 7, 17, 27, 37,47, 57, 67, 77, 87, 97 in the synthesis of the warhead structure in anenediyne compound; (d) at least one nucleic acid encoding a polypeptideselected from SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99; apolypeptide having at least 75% homology to a polypeptide of SEQ ID NOS:9, 19, 29, 39, 49, 59, 69, 79, 89, 99 as determined using the BLASTPalgorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69,79, 89, 99 during synthesis of a warhead structure in an enediynecompound; and a fragment thereof, which fragment has the ability tosubstitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69,79, 89, 99 in the synthesis of the warhead structure in an enediynecompound; and (e) at least one nucleic acid encoding a polypeptideselected from SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101; apolypeptide having at least 75% homology to a polypeptide of SEQ ID NOS:11, 21, 31, 41, 51, 61, 71, 81, 91, 101 as determined using the BLASTPalgorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71,81, 91, 101 during synthesis of a warhead structure in an enediynecompound; and a fragment thereof, which fragment has the ability tosubstitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71,81, 91, 101 in the synthesis of the warhead structure in an enediynecompound. In one aspect of this embodiment, the nucleic acid encodingthe gene cassette may be used in genetic engineering application tosynthesize the warhead structure of an enediyne compound.

Another embodiment of the present invention is an isolated or purifiedpolypeptides comprising a sequence selected from the group consistingof: (a) SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 and fragmentscomprising 1300, preferably 1450, more preferably 1550, still morepreferably 1650, still more preferably 1750 and most preferably 1850consecutive amino acids of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73,83, 93; (b) SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95; andfragments comprising 40, preferably 60, more preferably 80, still morepreferably 100, still more preferably 120 and most preferably 130consecutive amino acids of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75,85, 95; (c) SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97; andfragments comprising 220, preferably 240, more preferably 260, stillmore preferably 280, still more preferably 300 and most preferably 310consecutive amino acids of SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77,87, 97; (d) SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99; andfragments comprising 520, preferably 540, more preferably 560, stillmore preferably 580, still more preferably 600 and most preferably 620consecutive amino acids of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79,89, 99; and (e) SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101; andfragments comprising 220, preferably 240, more preferably 260, stillmore preferably 280, still more preferably 300 and most preferably 320consecutive amino acids of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81,91 and 101. One aspect of the present invention is an isolated orpurified polypeptide having at least 70% homology to the polypeptide ofthis embodiment by analysis with BLASTP algorithm with the defaultparameters. Another aspect of the present invention is an isolated orpurified polypeptide having at least 99% homology to the polypeptides ofthis embodiment as determined by analysis with BLASTP algorithm with thedefault parameters.

Another embodiment is an isolated or purified enediyne polyketidesynthase comprising a polypeptide selected from the group consisting of(a) SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93; (b) polypeptideshaving at least 75% homology to a polypeptide of SEQ ID NOS: 1, 13, 23,33, 43, 53, 63, 73, 83, 93 as determined using the BLASTP algorithm withthe default parameters and having the ability to substitute for apolypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83 or 93during synthesis a warhead structure in an enediyne compound; and (c)fragments of the polypeptides of (a) and (b), which fragments have theability to substitute for a polypeptide of SEQ ID NOS: 1, 13, 23, 33,43, 53, 63, 73, 83, 93 in the synthesis of the warhead structure in anenediyne compound. In one aspect of this embodiment, the enediynepolyketide synthase protein may be used in genetic engineeringapplications to synthesize the warhead structure of an enediynecompound.

Another embodiment is an isolated, purified enediyne polyketide synthasecatalytic complex comprising (a) a polypeptide selected from the groupconsisting of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93;polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 as determined using theBLASTP algorithm with the default parameters and having the ability tosubstitute for a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63,73, 83 or 93 during synthesis a warhead structure in an enediynecompound; and fragments thereof, which fragments have the ability tosubstitute for a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63,73, 83, 93 in the synthesis of the warhead structure in an enediynecompound; and (b) a polypeptide selected from the group consisting ofSEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95; polypeptideshaving at least 75% homology to a polypeptide of SEQ ID NOS: 3, 5, 15,25, 35, 45, 55, 65, 75, 85, 95 as determined using the BLASTP algorithmwith the default parameters and having the ability to substitute for apolypeptide of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95during synthesis of a warhead structure in an enediyne compound; andfragments thereof, which fragments have the ability to substitute for apolypeptide of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95 inthe synthesis of the warhead structure in an enediyne compound. In oneaspect of this embodiment, the enediyne polyketide synthase catalyticcomplex may be used in genetic engineering applications to synthesizethe warhead structure of an enediyne compound.

In another embodiment, the invention is a polypeptide selected from thegroup consisting of: (a) SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87,97; (b) polypeptides having at least 75% homology to a polypeptide ofSEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97 as determined usingthe BLASTP algorithm with the default parameters and having the abilityto substitute for a polypeptide of SEQ ID NOS: 7, 17, 27, 37, 47, 57,67, 77, 87, 97 during synthesis of a warhead structure in an enediynecompound; and (c) fragments of (a) or (b), which fragments have theability to substitute for a polypeptide of SEQ ID NOS: 7, 17, 27, 37,47, 57, 67, 77, 87, 97 in the synthesis of the warhead structure in anenediyne compound. In one aspect, the polypeptide of this embodiment maybe used with an enediyne polyketide synthase catalytic complex of theinvention in genetic engineering applications to synthesize the warheadstructure of an enediyne compound.

In another embodiment, the invention is a polypeptide selected from thegroup consisting of: (a) SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89,99; (b) polypeptides having at least 75% homology to a polypeptide ofSEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99 as determined usingthe BLASTP algorithm with the default parameters and having the abilityto substitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59,69, 79, 89, 99 during synthesis of a warhead structure in an enediynecompound; and (c) fragments of (a) or (b), which fragments have theability to substitute for a polypeptide of SEQ ID NOS: 9, 19, 29, 39,49, 59, 69, 79, 89, 99 in the synthesis of the warhead structure in anenediyne compound. In one aspect, the polypeptide of this embodiment maybe used with an enediyne polyketide synthase catalytic complex of theinvention in genetic engineering applications to synthesize the warheadstructure of an enediyne compound.

In another embodiment, the invention is a polypeptide selected from thegroup consisting of (a) SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91,101; (b) polypeptides having at least 75% homology to a polypeptide ofSEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101 as determined usingthe BLASTP algorithm with the default parameters and having the abilityto substitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41, 51, 61,71, 81, 91, 101 during synthesis of a warhead structure in an enediynecompound; and (c) fragments of (a) or (b), which fragments have theability to substitute for a polypeptide of SEQ ID NOS: 11, 21, 31, 41,51, 61, 71, 81, 91, 101 in the synthesis of the warhead structure in anenediyne compound. In one aspect of this embodiment, the polypeptide ofthis embodiment may be used with an enediyne polyketide synthasecatalytic complex of the invention in genetic engineering applicationsto synthesize the warhead structure of an enediyne compound.

An enediyne gene cluster may be identified using compositions of theinvention such as hybridization probes or PCR primers. Hybridizationprobes or PCR primers according to the invention are derived fromprotein families associated with the warhead structure characteristic ofenediynes. To identify enediyne gene clusters, the hybridization probesor PCR primers are derived from any one or more nucleic acid sequencescorresponding to the five protein families designated herein as PKSE,TEBC, UNBL, UNBV and UNBU. The compositions of the invention are used asprobes to identify enediyne biosynthetic genes, enediyne gene fragments,enediyne gene clusters, or enediyne producing organisms from samplesincluding potential enediyne producing microorganisms. The samples maybe in the form of environmental biomass, pure or mixed microbialculture, isolated genomic DNA from pure or mixed microbial culture,genomic DNA libraries from pure or mixed microbial culture. Thecompositions are used in polymerase chain reaction, and nucleic acidhybridization techniques well known to those skilled in the art.

Environmental samples that harbour microorganisms with the potential toproduce enediynes are identified by PCR methods. Nucleic acids containedwithin the environmental sample are contacted with primers derived fromthe invention so as to amplify target orthosomycin biosynthetic genesequences. Environmental samples deemed to be positive by PCR are thenpursued to identify and isolate the enediyne gene cluster and themicroorganism that contains the target gene sequences. The enediyne genecluster may be identified by generating genomic DNA libraries (forexample, cosmid, BAC, etc.) representative of genomic DNA from thepopulation of various microorganisms contained within the environmentalsample, locating genomic DNA clones that contain the target sequencesand possibly overlapping clones (for example, by hybridizationtechniques or PCR), determining the sequence of the desired genomic DNAclones and deducing the ORFs of the enediyne biosynthetic locus. Themicroorganism that contains the enediyne biosynthetic locus may beidentified and isolated, for example, by colony hybridization usingnucleic acid probes derived from either the invention or the newlyidentified enediyne biosynthetic locus. The isolated enediynebiosynthetic locus may be introduced into an appropriate surrogate hostto achieve heterologous production of the enediyne compound(s);alternatively, if the microorganism containing the enediyne biosyntheticlocus is identified and isolated it may be subjected to fermentation toproduce the enediyne compound(s).

A microorganism that harbours an enediyne gene cluster is firstidentified and isolated as a pure culture, for example, by colonyhybridization using nucleic acid probes derived from the invention.Beginning with a pure culture, a genomic DNA library (for example,cosmid, BAC, etc.) representative of genomic DNA from this singlespecies is prepared, genomic DNA clones that contain the targetsequences and possibly overlapping clones are located using probesderived from the invention (for example, by hybridization techniques orPCR), the sequence of the desired genomic DNA clones is determined andthe ORFs of the enediyne biosynthetic locus are deduced. Themicroorganism containing the enediyne biosynthetic locus may besubjected to fermentation to produce the enediyne compound(s) or theenediyne biosynthetic locus may be introduced into an appropriatesurrogate host to achieve heterologous production of the enediynecompound(s).

An enediyne gene cluster may also be identified in silico using one ormore sequences selected from enediyne-specific nucleic acid code, andenediyne-specific polypeptide code as taught by the invention. A queryfrom a set of query sequences stored on computer readable medium is readand compared to a subject selected from the reference sequences of theinvention. The level of similarity between said subject and query isdetermined and queries sequences representing enediyne genes areidentified.

Thus another embodiment of the invention is a method of identifying anenediyne biosynthetic gene or gene fragment comprising providing asample containing genomic DNA, and detecting the presence of a nucleicacid sequence coding for a polypeptide from at least one or the groupsconsisting of: (a) SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93;and polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 as determined using theBLASTP algorithm with the default parameters; (b) SEQ ID NOS: 3, 5, 15,25, 35, 45, 55, 65, 75, 85, 95; and polypeptides having at least 75%homology to a polypeptide of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65,75, 85, 95 as determined using the BLASTP algorithm with the defaultparameters; (c) SEQ ID NOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97; andpolypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 7, 17, 27, 37, 47, 57, 67, 77, 87, 97 as determined using theBLASTP algorithm with the default parameters; (d) SEQ ID NOS: 9, 19, 29,39, 49, 59, 69, 79, 89, 99; and polypeptides having at least 75%homology to a polypeptide of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79,89, 99 as determined using the BLASTP algorithm with the defaultparameters; and (e) SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101;and polypeptides having at least 75% homology to a polypeptide of SEQ IDNOS: 11, 21, 31, 41, 51, 61, 71, 81, 91 and 101 as determined using theBLASTP algorithm with the default parameters. One aspect of thisembodiment provides detecting a nucleic acid sequence coding apolypeptide from at least two of the above groups (a), (b), (c), (d) and(e). Another aspect of this embodiment provides detecting a nucleic acidsequence coding a polypeptide from at least three of the groups (a),(b), (c), (d) and (e). Another aspect of this embodiment providesdetecting a nucleic acid sequence coding a polypeptide from at leastfour of the groups (a), (b), (c), (d) and (e). Another aspect of thisembodiment provides detecting a nucleic acid sequence coding apolypeptide from each of the groups (a), (b), (c), (d) and (e). Anotheraspect of this embodiment of the invention provide the further step ofusing the nucleic acid detected to isolate an enediyne gene cluster fromthe sample containing genomic DNA. Another aspect of this embodiment ofthe invention comprises identifying an organism containing the nucleicacid sequence detected from the genomic DNA in the sample.

It is understood that the invention, having provided, compositions andmethods to identify enediyne biosynthetic gene cluster, further providesenediynes produced by the biosynthetic gene clusters identified.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system which implements andexecutes software tools for the purpose of comparing a query to asubject, wherein the subject is selected from the reference sequences ofthe invention

FIGS. 2A, 2B, 2C and 2D are flow diagrams of a sequence comparisonsoftware that can be employed for the purpose of comparing a query to asubject, wherein the subject is selected from the reference sequences ofthe invention, wherein FIG. 2A is the query initialization subprocess ofthe sequence comparison software, FIG. 2B is the subject datasourceinitialization subprocess of the sequence comparison software, FIG. 2Cillustrates the comparison subprocess and the analysis subprocess of thesequence comparison software, and FIG. 2D is the Display/Reportsubprocess of the sequence comparison software.

FIG. 3 is a flow diagram of the comparator algorithm (238) of FIG. 2Cwhich is one embodiment of a comparator algorithm that can be used forpairwise determination of similarity between a query/subject pair.

FIG. 4 is a flow diagram of the analyzer algorithm (244) of FIG. 2Cwhich is one embodiment of an analyzer algorithm that can be used toassign identity to a query sequence, based on similarity to a subjectsequence, where the subject sequence is a reference sequence of theinvention.

FIG. 5 is a schematic representation comparing the calicheamicinenediyne biosynthetic locus from Micromonospora echinospora subsp.calichensis (CALI), the macromomycin (auromomycin) enediyne biosyntheticlocus from Streptomyces macromycetius (MACR), and a chromoproteinenediyne biosynthetic locus from Streptomyces ghanaensis (009C). Openreading frames in each locus are identified by boxes; gray boxesindicate ORFs that are not common to the three enediyne loci, blackboxes indicate ORFs that are common to the three enediyne loci and arelabeled using a four-letter protein family designation. The scale is inkilobases.

FIG. 6 illustrates the 5 genes conserved throughout ten enediynebiosynthetic loci from diverse genera, including both chromoprotein andnon-chromoprotein enediyne loci.

FIG. 7 is a graphical depiction of the domain architecture typical ofenediyne polyketide synthases (PKSE).

FIG. 8 is an amino acid clustal alignment of full length enediynepolyketide synthase (PKSE) proteins from ten enediyne biosynthetic loci.Approximate domain boundaries are indicated above the alignment.Conserved residues or motifs important for the function of each domainare highlighted in black.

FIG. 9A is an amino acid clustal alignment comparing the acyl carrierprotein (ACP) domain of the PKSEs from three known enediynes,macromomycin (MACR), calicheamicin (CALI), and neocarzinostatin (NEOC),and the ACP domain of the actinorhodin Type II PKS system (1AF8). FIG.9B depicts the space-filling side-chains of the conserved residues onthe three dimensional structure of the ACP of the actinorhodin Type IIPKS system (1AF8).

FIG. 10A is an amino acid clustal alignment comparing the4′-phosphopantetheinyl tranferase (PPTE) domain of the PKSEs from threeknown enediynes, macromomycin (MACR), calicheamicin (CALI), andneocarzinostatin (NEOC), and the 4′-phosphopantetheinyl transferase,Sfp, of Bacillus subtilis (sfp). Conserved residues are boxed. The knownsecondary structure of Sfp is shown below the aligned sequences and thepredicted secondary struture of the PPTE domain of the PKSE is shownabove the aligned sequences wherein the boxes indicate α-helices and thearrows indicate β-sheets. FIG. 10B shows how the conserved residues ofthe 4′-phosphopantetheinyl transferase Sfp co-ordinate a magnesium ionand coenzyme A; corresponding residues in the neocarzinostatin PPTEdomain are shown in bold.

FIG. 11 is an amino acid clustal alignment of eleven TEBC proteins and4-hydroxybenzoyl-CoA thioesterase (1 BVQ) superimposed with thesecondary structure of 1 BVQ. Alpha-helices (α) and beta-sheets (β) aredepicted by arrows.

FIG. 12 is an amino acid clustal alignment of ten UNBL proteins.

FIG. 13 is an amino acid clustal alignment of ten UNBV proteinshighlighting the putative N-terminal signal sequence that likely targetsthese proteins for secretion.

FIG. 14 is an amino acid clustal alignment of ten UNBU proteinshighlighting the putative transmembrane domains that likely anchor thisfamily of proteins within the cell membrane.

FIG. 15 shows restriction site and functional maps of plasmidspECO1202-CALI-1 and pECO1202-CALI-4 of the invention. The open readingframes of the genes forming an expression cassette according to theinvention are shown as arrows pointing in the direction oftranscription.

FIG. 16 shows restriction site and functional maps of plasmidspECO1202-CALI-5, pECO1202-CALI-2, pECO1202-CALI-3, pECO1202-CALI-6 andpECO1202-CALI-7. The open reading frames of the genes forming theexpression cassette according to the invention are shown as arrowspointing in the direction of transcription.

FIG. 17 is an immunoblot analysis of His-tagged TEBC protein in totalprotein extracts from recombinant S. lividans TK24 clones harboring thepECO1202-CALI-2 or the pECO1202-CALI-4 expression vector.

FIG. 18 is an immunoblot analysis of His-tagged TEBC protein infractionated extracts from recombinant S. lividans TK24 clones harboringthe pECO1202-CALI-2 expression vector.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides enediyne related compositions. The compositionscan be used to produce enediyne-related compounds. The compositions canalso be used to identify enediyne natural products, enediyne genes,enediyne gene clusters and enediyne producing organisms. The inventionrests on the surprising discovery that all enediynes, includingchromoprotein enediynes and non-chromoprotein enediynes, use a conservedset of genes for formation of the warhead structure.

To provide the compositions and methods of the invention, a sample ofthe microorganism Streptomyces macromyceticus was obtained and thebiosynthetic locus for the chromoprotein enediyne macromomycin wasidentified. The gene cluster was identified as the biosynthetic locusfor macromomycin from Streptomyces macromyceticus NRRL B-5335 (sometimesreferred to herein as MACR), firstly by confirming the sequence encodingthe apoprotein associated with the chromoprotein, which sequence isdisclosed in Samy T S et al., J. Biol. Chem (1983) January 10; 258(1)pp. 183-91, and secondly using the genome scanning procedure disclosedin co-pending application U.S. Ser. No. 09/910,813.

A sample of the microorganism Micromonospora echinospora subsp.calichensis was then obtained and the full biosynthetic locus for thenon-chromoprotein enediyne calicheamicin was identified. The genecluster was identified as the biosynthetic locus for calicheamicin fromMicromonospora echinospora subsp. calichensis NRRL 15839 (sometimesreferred to herein as CALI) by comparing the sequence with the partiallocus for CALI which was disclosed in WO 00/40596. We were able toovercome the problems encountered in prior attempts to isolate and clonethe entire biosynthetic locus by using a shotgun-based approach asdescribed in co-pending application U.S. Ser. No. 09/910,813.

We identified two further enediyne natural products biosynthetic locifrom organisms not previously reported to produce enediyne compounds,namely a chromoprotein enediyne from Streptomyces ghanaensis NRRLB-12104 (sometimes referred to herein as 009C), and a chromoproteinenediyne from Amycolatopsis orientalis ATCC 43491 (sometimes referred toherin as 007A). The presence of an apoprotein encoding gene in 009C and007A confirms that 009C and 007A produce chromoprotein enediynecompounds.

Comparison of the MACR, CALI, 009C and 007A loci revealed that all locicontain at least one a member of five (5) protein families. The fiveprotein families are referred to throughout the description and figuresby reference to a four-letter designation as indicated Table 1. TABLE 1Family descriptions Families Function PKSE unusual polyketide synthase,found only in enediyne biosynthetic loci and involved in warheadformation; believed to act iteratively. TEBC thioesterase unique toenediyne biosynthetic loci; significant similarity to small (130-150 aa)proteins of the 4-hydroxybenzoyl-CoA thioesterase family in a number ofbacteria. UNBL unique to enediyne biosynthetic loci; these proteins arerich in basic amino acids and contain several conserved or invarianthistidine residues. UNBV unique to enediyne biosynthetic loci; secretedproteins; contain putative cleavable N-terminal signal sequence;believed to be associated with stabilization and/or export of theenediyne chromophore and/or late modifications in the biosynthesis ofenediyne chromophores. UNBU unique to enediyne biosynthetic loci;C-terminal domain homology to bacterial putative ABC transporters andpermease transport systems; integral membrane proteins with seven oreight putative membrane-spanning alpha helices; believed to be involvedin transport of enediynes and/or intermediates across the cell membrane.

A member of each of the five protein families was found in each of themore than ten biosynthetic loci for chromoprotein and non-chromoproteinenediynes studied. Two of the five protein families, PKSE and TEBC, forma polyketide synthase catalytic complex involved in formation of thewarhead structure that distinguishes enediyne compounds. The other threeprotein families conserved throughout chromoprotein andnon-chromoprotein enediyne biosynthetic loci are also associated withthe warhead structure that characterizes enediyne compounds. Nucleicacid sequences and polypeptide sequences related to these five proteinfamilies form the basis for the compositions and methods of theinvention.

We have discovered at least one member of each of the protein familiesPKSE, TEBC, UNBL, UNBV and UNBU in all of the 10 enediyne biosyntheticloci studied, including MACR, CALI, 009C, 007A, an enediyne biosyntheticlocus from Kitasatosporia sp. (sometimes referred to herein as 028D), anenediyne biosynthetic locus from Micromonospora megalomicea (sometimesreferred to herein as 054A), an enediyne biosynthetic locus fromSaccharothrix aerocolonigenes (sometimes referred to herein as 132H), anenediyne biosynthetic locus from Streptomyces kaniharaensis (sometimesreferred to herein as 135E), an enediyne biosynthetic locus fromStreptomyces citricolor (sometimes referred to herein as 145B), and thebiosynthetic locus for the chromoprotein enediyne neocarzinostatin fromStreptomyces carzinostaticus (sometimes referred to herein as NEOC).

The protein families PKSE, TEBC, UNBL, UNBV and UNBU of the presentinvention are associated with warhead formation in enediyne compoundsand are found in both chromoprotein and non-chromoprotein enediynebiosynthetic loci. Members of the protein families PKSE, TEBC, UNBL,UNBV and UNBU found within an enediyne biosynthetic loci are necessarilypresent in a single operon and are therefore not necessarilytranscriptionally linked to one another. However, the members of theprotein families PKSE, TEBC, UNBL, UNBV and UNBU that are found within asingle enediyne biosynthetic locus are functionally linked to oneanother in that they act in a concerted fashion in the production of anenediyne product. Although expression of functionally linked enediynespecific genes encoding members of the PKSE, TEBC, UNBL, UNBV and UNBUprotein families may be under control of distinct transcriptionalpromoters, they may nonetheless be expressed in a concerted fashion.

Due to high overall sequence conservation between members of the PKSE,TEBC, UNBL, UNBV and UNBU protein families, it is expected that membersof the PKSE, TEBC, UNBL, UNBV and UNBU protein families may be exchangedfor another member of the same protein family while retaining theability of the new enediyne biosynthetic system to synthesize thewarhead structure of an enediyne compound. Thus, it is contemplated thatgenes encoding a polypeptide from protein families PKSE, TEBC, UNBL,UNBV and UNBU from two or more different enediyne biosynthetic systemsmay be combined so as to obtain a full complement of the five-geneenediyne cassette of the invention, wherein one or more genes in theenediyne cassette has inherent or engineered optimal properties.

Representative nucleic acid sequences and polypeptide sequences drawnfrom each of the ten enediyne loci described herein are provided in theaccompanying sequence listing as examples of the compositions of theinvention. Referring to the sequence listing, a nucleic acid sequenceencoding a member of the PKSE protein family of the invention from thebiosynthetic locus for macromomycin from Streptomyces macromyceticus(MACR) is provided in SEQ ID NO: 2, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 1. Nucleic acid sequencesencoding two members of the TEBC protein family from MACR are providedin SEQ ID NOS: 4 and 6 with the corresponding deduced polypeptidesequences provided in SEQ ID NOS: 3 and 5 respectively. A nucleic acidsequence encoding a member of the UNBL protein family from MACR isprovided in SEQ ID NO: 8 with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 7. A nucleic acid sequence encoding amember of the protein family UNBV from MACR is provided in SEQ ID NO: 10with the corresponding deduced polypeptide provided in SEQ ID NO: 9. Anucleic acid sequence encoding a member of the protein family UNBU fromMACR is provided in SEQ ID NO: 12 with the corresponding deducedpolypeptide provided in SEQ ID NO: 11.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the biosynthetic locus for calicheamicin fromMicromonospora echinospora subsp. calichensis (CALI) is provided in SEQID NO: 14, with the corresponding deduced polypeptide sequence providedin SEQ ID NO: 13. A nucleic acid sequence encoding a member of the TEBCprotein family from CALI is provided in SEQ ID NO: 16, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 15. Anucleic acid sequence encoding a member of the UNBL protein family fromCALI is provided in SEQ ID NO: 18, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 17. A nucleic acid sequenceencoding a member of the UNBV protein family from CALI is provided inSEQ ID NO: 20, with the corresponding deduced polypeptide sequenceprovided in SEQ ID NO: 19. A nucleic acid sequence encoding a member ofthe UNBU protein family from CALI is provided in SEQ ID NO: 22, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 21.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Streptomycesghanaensis (009C) is provided in SEQ ID NO: 24, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 23. A nucleic acidsequence encoding a member of the TEBC protein family from 009C isprovided in SEQ ID NO: 26, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 25. A nucleic acid sequence encoding amember of the UNBL protein family from 009C is provided in SEQ ID NO:28, with the corresponding deduced polypeptide sequence provided in SEQID NO: 27. A nucleic acid sequence encoding a member of the UNBV proteinfamily from 009C is provided in SEQ ID NO: 30, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 29. A nucleic acidsequence encoding a member of the UNBU protein family from 009C isprovided in SEQ ID NO: 32, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 31.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the biosynthetic locus for neocazinostatin fromStreptomyces carzinostaticus subsp. neocarzinostaticus (NEOC) isprovided in SEQ ID NO: 34, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 33. A nucleic acid sequence encoding amember of the TEBC protein family from NEOC is provided in SEQ ID NO:36, with the corresponding deduced polypeptide sequence provided in SEQID NO: 35. A nucleic acid sequence encoding a member of the UNBL proteinfamily from NEOC is provided in SEQ ID NO: 38, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 37. A nucleic acidsequence encoding a member of the UNBV protein family from NEOC isprovided in SEQ ID NO: 40, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 39. A nucleic acid sequence encoding amember of the UNBU protein family from NEOC is provided in SEQ ID NO:42, with the corresponding deduced polypeptide sequence provided in SEQID NO: 41.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Amycolatopsisorientalis (007A) is provided in SEQ ID NO: 44, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 43. A nucleic acidsequence encoding a member of the TEBC protein family from 007A isprovided in SEQ ID NO: 46, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 45. A nucleic acid sequence encoding amember of the UNBL protein family from 007A is provided in SEQ ID NO:48, with the corresponding deduced polypeptide sequence provided in SEQID NO: 47. A nucleic acid sequence encoding a member of the UNBV proteinfamily from 007A is provided in SEQ ID NO: 50, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 49. A nucleic acidsequence encoding a member of the UNBU protein family from 007A isprovided in SEQ ID NO: 52, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 51.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Kitasatosporiasp. (028D) is provided in SEQ ID NO: 54, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 53. A nucleic acid sequenceencoding a member of the TEBC protein family from 028D is provided inSEQ ID NO: 56, with the corresponding deduced polypeptide sequenceprovided in SEQ ID NO: 55. A nucleic acid sequence encoding a member ofthe UNBL protein family from 028D is provided in SEQ ID NO: 58, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 57. Anucleic acid sequence encoding a member of the UNBV protein family from028D is provided in SEQ ID NO: 60, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 59. A nucleic acid sequenceencoding a member of the UNBU protein family from 028D is provided inSEQ ID NO: 62, with the corresponding deduced polypeptide sequenceprovided in SEQ ID NO: 61.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Micromonosporamegalomicea (054A) is provided in SEQ ID NO: 64, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 63. A nucleic acidsequence encoding a member of the TEBC protein family from 054A isprovided in SEQ ID NO: 66, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 65. A nucleic acid sequence encoding amember of the UNBL protein family from 054A is provided in SEQ ID NO:68, with the corresponding deduced polypeptide sequence provided in SEQID NO: 67. A nucleic acid sequence encoding a member of the UNBV proteinfamily from 054A is provided in SEQ ID NO: 70, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 69. A nucleic acidsequence encoding a member of the UNBU protein family from 054A isprovided in SEQ ID NO: 72, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 71.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Saccharothrixaerocolonigenes (132H) is provided in SEQ ID NO: 74, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 73. Anucleic acid sequence encoding a member of the TEBC protein family from132H is provided in SEQ ID NO: 76, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 75. A nucleic acid sequenceencoding a member of the UNBL protein family from 132H is provided inSEQ ID NO: 78, with the corresponding deduced polypeptide sequenceprovided in SEQ ID NO: 77. A nucleic acid sequence encoding a member ofthe UNBV protein family from 132H is provided in SEQ ID NO: 80, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 79. Anucleic acid sequence encoding a member of the UNBU protein family from132H is provided in SEQ ID NO: 82, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 81.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Streptomyceskaniharaensis (135E) is provided in SEQ ID NO: 84, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 83. Anucleic acid sequence encoding a member of the TEBC protein family from135E is provided in SEQ ID NO: 86, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 85. A nucleic acid sequenceencoding a member of the UNBL protein family from 135E is provided inSEQ ID NO: 88, with the corresponding deduced polypeptide sequenceprovided in SEQ ID NO: 87. A nucleic acid sequence encoding a member ofthe UNBV protein family from 135E is provided in SEQ ID NO: 90, with thecorresponding deduced polypeptide sequence provided in SEQ ID NO: 89. Anucleic acid sequence encoding a member of the UNBU protein family from135E is provided in SEQ ID NO: 92, with the corresponding deducedpolypeptide sequence provided in SEQ ID NO: 91.

A nucleic acid sequence encoding a member of the PKSE protein family ofthe invention from the enediyne biosynthetic locus from Streptomycescitricolor (145B) is provided in SEQ ID NO: 94, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 93. A nucleic acidsequence encoding a member of the TEBC protein family from 145B isprovided in SEQ ID NO: 96, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 95. A nucleic acid sequence encoding amember of the UNBL protein family from 145B is provided in SEQ ID NO:98, with the corresponding deduced polypeptide sequence provided in SEQID NO: 97. A nucleic acid sequence encoding a member of the UNBV proteinfamily from 145B is provided in SEQ ID NO: 100, with the correspondingdeduced polypeptide sequence provided in SEQ ID NO: 99. A nucleic acidsequence encoding a member of the UNBU protein family from 145B isprovided in SEQ ID NO: 102, with the corresponding deduced polypeptidesequence provided in SEQ ID NO: 101.

As used herein, PKSE refers to a family of polyketide synthase proteinsthat are uniquely associated with enediyne biosynthetic loci and thatare involved in synthesis of the warhead structure that characterizesenediyne compounds. Representative members of the protein family PKSEinclude the polypeptides of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73,83, and 93. Other members of protein family PKSE include polypeptideshaving at least 75%, preferably 80%, more preferably, 85% still morepreferably 90% and most preferably 95% or more homology to a polypeptidehaving the sequence of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93as determined using the BLASTP algorithm with the default parameters andhaving the ability to substitute for another PKSE protein and retainingthe ability to act in a concerted fashion with a TEBC protein duringsynthesis of a warhead structure of an enediyne compound. Other membersof the protein family PKSE include fragments, analogs and derivatives ofthe above polypeptides, which fragments, analogs and derivatives havethe ability to substitute for another PKSE protein and retain theability to act in a concerted fashion with TEBC during synthesis of awarhead structure of an enediyne compound.

TEBC refers to a family of thioesterase proteins unique to enediynebiosynthesis which together with a protein from the protein family PKSEforms an enediyne polyketide catalytic complex and is involved insynthesis of a warhead structure that characterizes enediyne compounds.Representative members of the protein family TEBC include thepolypeptides of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, and95. Other members of protein family TEBC include polypeptides having atleast 75%, preferably 80%, more preferably, 85% still more preferably90% and most preferably 95% or more homology to a polypeptide having thesequence of SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95 asdetermined using the BLASTP algorithm with the default parameters andretaining the ability to act in a concerted fashion with a protein fromthe protein family PKSE during synthesis of a warhead structure in anenediyne compound. Other members of the protein family TEBC includefragments, analogs and derivatives of the above polypeptides, whichfragments, analogs and derivatives have the ability to substitute foranother TEBC protein and retain the ability to act in a concertedfashion with a PKSE protein during formation of a warhead structure inan enediyne compound.

UNBL refers to a family of proteins indicative of enediyne biosyntheticloci and which are rich in basic amino acids and contain severalconserved or invariant histidine residues. Representative members of theprotein family UNBL include the polypeptides of SEQ ID NOS: 7, 17, 27,37, 47, 57, 67, 77, 87 and 97. Other members of protein family UNBLinclude polypeptides having at least 75%, preferably 80%, morepreferably, 85% still more preferably 90% and most preferably 95% ormore homology to a polypeptide having the sequence of SEQ ID NOS: 7, 17,27, 37, 47, 57, 67, 77, 87 and 97 as determined using the BLASTPalgorithm with the default parameters and that are present in a genecluster associated with the biosyntehsis of an enediyne compound. Othermembers of the protein family UNBL include fragments, analogs andderivatives of the above polypeptides, which fragments, analogs andderivatives have the ability to substitute for another UNBL protein andretain the ability to act in a concerted fashion with genes in anenediyne biosynthetic locus to form a warhead structure of an enediynecompound.

UNBV refers to a family of proteins indicative of enediyne biosyntheticloci and which may contain a cleavable N-terminal signal sequence.Representative members of the protein family UNBV include thepolypeptides of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89 and 99.Other members of protein family UNBV include polypeptides having atleast 75%, preferably 80%, more preferably, 85% still more preferably90% and most preferably 95% or more homology to a polypeptide having thesequence of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89 and 99 asdetermined using the BLASTP algorithm with the default parameters andthat are present in a gene cluster associated with the biosynthesis ofan enediyne compound. Other members of the protein family UNBV includefragments, analogs and derivatives of the above polypeptides, whichfragments, analogs and derivatives have the ability to substitute foranother UNBV protein and retain the ability to act in a concertedfashion with genes in an enediyne biosynthetic locus to form a warheadstructure in an enediyne compound.

UNBU refers to a family of membrane proteins indicative of enediynebiosynthetic loci. Representative members of the protein family UNBUinclude the polypeptides of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81,91 and 101. Other members of protein family UNBU include polypeptideshaving at least 75%, preferably 80%, more preferably, 85% still morepreferably 90% and most preferably 95% or more homology to a polypeptidehaving the sequence of SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91and 101 as determined using the BLASTP algorithm with the defaultparameters and that are present in a gene cluster associated with thebiosynthesis of an enediyne compound. Other members of the proteinfamily UNBU include fragments, analogs and derivatives of the abovepolypeptides, which fragments, analogs and derivatives have the abilityto substitute for another UNBU protein and retain the ability to act ina concerted fashion with genes in an enediyne biosynthetic locus to formthe warhead structure in an enediyne compound.

“Enediyne producer” or “enediyne-producing organism” refers to amicroorganism which carries the genetic information necessary to producean enediyne compound, whether or not the organism is known to produce anenediyne product. The terms apply equally to organisms in which thegenetic information to produce an enediyne compound is found in theorganism as it exists in its natural environment, and to organisms inwhich the genetic information is introduced by recombinant techniques.For the sake of particularity, specific organisms contemplated hereininclude organisms of the family Micromonosporaceae, of which preferredgenera include Micromonospora, Actinoplanes and Dactylosporangium; thefamily Streptomycetaceae, of which preferred genera include Streptomycesand Kitasatospora; the family Pseudonocardiaceae, of which preferredgenera are Amycolatopsis and Saccharopolyspora; and the familyActinosynnemataceae, of which preferred genera include Saccharothrix andActinosynnema; however the terms are intended to encompass all organismscontaining genetic information necessary to produce an enediynecompound.

“Enediyne biosynthetic gene product” refers to any enzyme involved inthe biosynthesis of an enediyne, whether a chromoprotein enediyne or anon-chromoprotein enediyne. These gene products are located in anyenediyne biosynthetic locus in an organism of the familyMicromonosporaceae, of which preferred genera include Micromonospora,Actinoplanes and Dactylosporangium; the family Streptomycetaceae, ofwhich preferred genera include Streptomyces and Kitasatospora; thefamily Pseudonocardiaceae, of which preferred genera are Amycolatopsisand Saccharopolyspora. For the sake of particularity, the enediynebiosynthetic loci described herein are associated with Streptomycesmacromyceticus, Micromonospora echinospora subsp. calichensis,Streptomyces ghanaensis, Streptomyces carzinostaticus subsp.neocarzinostaticus, Amycolatopsis orientalis, Kitasatosporia sp.,Micromonospora megalomicea, Saccharothrix aerocolonigenes, Streptomyceskaniharaensis, and Streptomyces citricolor; however, it should beunderstood that this term encompasses enediyne biosynthetic enzymes (andgenes encoding such enzymes) isolated from any microorganism of thegenus Streptomyces, Micromonospora, Amycolatopsis, Kitesatosporia, orSaccharithrix and furthermore that these genes may have novel homologuesin any microorganism, actinomycete or non-actinomycete, that fallswithin the scope of the claims stated herein. Specific embodimentsinclude the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17,19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53,55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89,91, 93, 95, 97, 99, 101.

The term “isolated” means that the material is removed from its originalenvironment, e.g. the natural environment if it is naturally occurring.For example, a naturally-occurring polynucleotide or polypeptide presentin a living organism is not isolated, but the same polynucleotide orpolypeptide, separated from some or all of the coexisting materials inthe natural system, is isolated. Such polynucleotides could be part of avector and/or such polynucleotides or polypeptides could be part of acomposition, and still be isolated in that such vector or composition isnot part of its natural environment.

The term “purified” does not require absolute purity; rather, it isintended as a relative definition. Individual nucleic acids obtainedfrom a library have been conventionally purified to electrophoretichomogeneity. The purified nucleic acids of the present invention havebeen purified from the remainder of the genomic DNA in the organism byat least 10⁴ to 10⁶ fold. However, the term “purified” also includesnucleic acids which have been purified from the remainder of the genomicDNA or from other sequences in a library or other environment by atleast one order of magnitude, preferably two or three orders ofmagnitude, and more preferably four or five orders of magnitude.

“Recombinant” means that the nucleic acid is adjacent to “backbone”nucleic acid to which it is not adjacent in its natural environment.“Enriched” nucleic acids represent 5% or more of the number of nucleicacid inserts in a population of nucleic acid backbone molecules.“Backbone” molecules include nucleic acids such as expression vectors,self-replicating nucleic acids, viruses, integrating nucleic acids, andother vectors or nucleic acids used to maintain or manipulate a nucleicacid of interest. Preferably, the enriched nucleic acids represent 15%or more, more preferably 50% or more, and most preferably 90% or more,of the number of nucleic acid inserts in the population of recombinantbackbone molecules.

“Recombinant polypeptides” or “recombinant proteins” refers topolypeptides or proteins produced by recombinant DNA techniques, i.e.produced from cells transformed by an exogenous DNA construct encodingthe desired polypeptide or protein. “Synthetic” polypeptides or proteinsare those prepared by chemical synthesis.

The term “gene” means the segment of DNA involved in producing apolypeptide chain; it includes regions preceding and following thecoding region (leader and trailer) as well as, where applicable,intervening regions (introns) between individual coding segments(exons).

The term “operon” means a transctional gene cassette under the controlof a single transcriptional promoter, which gene cassette encodespolypeptides that may act in a concerted fashion to carry out abiochemical pathway and/or cellular process.

A DNA or nucleotide “coding sequence” or “sequence encoding” aparticular polypeptide or protein, is a DNA sequence which istranscribed and translated into a polypeptide or protein when placedunder the control of appropriate regulatory sequences.

“Oligonucleotide” refers to a nucleic acid, generally of at least 10,preferably 15 and more preferably at least 20 nucleotides, preferably nomore than 100 nucleotides, that are hybridizable to a genomic DNAmolecule, a cDNA molecule, or an mRNA molecule encoding a gene, mRNA,cDNA or other nucleic acid of interest.

A promoter sequence is “operably linked to” a coding sequence recognizedby RNA polymerase which initiates transcription at the promoter andtranscribes the coding sequence into mRNA.

“Plasmids” are designated herein by a lower case p followed by capitalletters and/or numbers. The starting plasmids herein are commerciallyavailable, publicly available on an unrestricted basis, or can beconstructed from available plasmids in accord with published procedures.In addition, equivalent plasmids to those described herein are known inthe art and will be apparent to the skilled artisan.

“Digestion” of DNA refers to enzymatic cleavage of the DNA with arestriction enzyme that acts only at certain sequences in the DNA. Thevarious restriction enzymes used herein are commercially available andtheir reaction conditions, cofactors and other requirements were used aswould be known to the ordinary skilled artisan. For analytical purposes,typically 1 μg of plasmid or DNA fragment is used with about 2 units ofenzyme in about 20 μl of buffer solution. For the purpose of isolatingDNA fragments for plasmid construction, typically 5 to 50 μg of DNA aredigested with 20 to 250 units of enzyme in a larger volume. Appropriatebuffers and substrate amounts for particular enzymes are specified bythe manufacturer. Incubation times of about 1 hour at 37° C. areordinarily used, but may vary in accordance with the supplier'sinstructions. After digestion, gel electrophoresis may be performed toisolate the desired fragment.

Two deposits have been made with the International Depositary Authorityof Canada, Bureau of Microbiology, Health Canada, 1015 Arlington Street,Winnipeg, Manitoba, Canada R3E 3R2 on Apr. 3, 2002. The first deposit isan E. coli DH10B strain harbouring a cosmid clone (020CN) of a partialbiosynthetic locus for macromomycin from Streptomyces macromyceticus,including open reading frames coding for the polypeptides of SEQ ID NOS:1, 3, 5, 7, 9 and 11, which deposit was assigned deposit accessionnumber IDAC030402-1. The second deposit is an E. coli DH10B strainharbouring a cosmid clone (061CR) of a partial biosynthetic locus forcalicheamicin from Micromonospora echinospora subsp. calichensis,including open reading frames coding for the polypeptides of SEQ ID NOS:13, 15, 17, 19, and 21, which deposit was assigned accession number IDAC030402-2. The E. coli strain deposits are referred to herein as “thedeposited strains”.

The deposited strains comprise a member from each of the proteinfamilies PKSE, TEBC, UNBL, UNBV and UNBU drawn from a chromoproteinenediyne biosynthetic locus (macromomycin) and a member from each of theprotein families PKSE, TEBC, UNBL, UNBV and UNBU drawn from anon-chromoprotein enediyne biosynthetic locus (calicheamicin). Thesequence of the polynucleotides comprised in the deposited strains, aswell as the amino acid sequence of any polypeptide encoded thereby arecontrolling in the event of any conflict with any description ofsequences herein.

The deposit of the deposited strains has been made under the terms ofthe Budapest Treaty on the International Recognition of the Deposit ofMicro-organisms for Purposes of Patent Procedure. The deposited strainswill be irrevocably and without restriction or condition released to thepublic upon the issuance of a patent. The deposited strains are providedmerely as convenience to those skilled in the art and are not anadmission that a deposit is required for enablement, such as thatrequired under 35 U.S.C. §112. A license may be required to make, use orsell the deposited strains or nucleic acids therein, and compoundsderived therefrom, and no such license is hereby granted.

Representative nucleic acid sequences encoding members of the fiveprotein families are provided in the accompanying sequence listing asSEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68,70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102.Representative polypeptides representing members of the five proteinfamilies are provided in the accompanying sequence listing as SEQ IDNOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101.

One aspect of the present invention is an isolated, purified, orenriched nucleic acid comprising one of the sequences of SEQ ID NOS: 2,4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40,42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76,78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, the sequencescomplementary thereto, or a fragment comprising at least 10, 15, 20, 25,30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases ofone of the sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56,58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92,94, 96, 98, 100, 102 or the sequences complementary thereto. Theisolated, purified or enriched nucleic acids may comprise DNA, includingcDNA, genomic DNA, and synthetic DNA. The DNA may be double stranded orsingle stranded, and if single stranded may be the coding or non-coding(anti-sense) strand. Alternatively, the isolated, purified or enrichednucleic acids may comprise RNA.

As discussed in more detail below, the isolated, purified or enrichednucleic acids of one of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56,58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92,94, 96, 98, 100, 102 may be used to prepare one of the polypeptides ofSEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67,69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101 orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100or 100 consecutive amino acids of one of the polypeptides of SEQ ID NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73,75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101.

Accordingly, another aspect of the present invention is an isolated,purified or enriched nucleic acid which encodes one of the polypeptidesof SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29,31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65,67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101,or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100 or 150 consecutive amino acids of one of the polypeptides of SEQ IDNOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101. The codingsequences of these nucleic acids may be identical to one of the codingsequences of one of the nucleic acids of SEQ ID NOS: 2, 4, 6, 8, 10, 12,14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48,50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84,86, 88, 90, 92, 94, 96, 98, 100, 102, or a fragment thereof or may bedifferent coding sequences which encode one of the polypeptides of SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73,75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101 as a result ofthe redundancy or degeneracy of the genetic code. The genetic code iswell known to those of skill in the art and can be obtained, forexample, from Stryer, Biochemistry, 3^(rd) edition, W. H. Freeman & Co.,New York.

The isolated, purified or enriched nucleic acid which encodes one of thepolypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23,25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59,61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95,97, 99, 101, may include, but is not limited to: (1) only the codingsequences of one of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22,24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58,60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,96, 98, 100, 102; (2) the coding sequences of SEQ ID NOS: 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44,46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80,82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102 and additional codingsequences, such as leader sequences or proprotein sequences; or (3) thecoding sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22,24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58,60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,96, 98, 100, 102 and non-coding sequences, such as introns or non-codingsequences 5′ and/or 3′ of the coding sequence. Thus, as used herein, theterm “polynucleotide encoding a polypeptide” encompasses apolynucleotide which includes only coding sequence for the polypeptideas well as a polynucleotide which includes additional coding and/ornon-coding sequence.

The invention relates to polynucleotides based on SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102 but havingpolynucleotide changes that are “silent”, for example changes which donot alter the amino acid sequence encoded by the polynucleotides of SEQID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102. Theinvention also relates to polynucleotides which have nucleotide changeswhich result in amino acid substitutions, additions, deletions, fusionsand truncations of the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83,85, 87, 89, 91, 93, 95, 97, 99, 101. Such nucleotide changes may beintroduced using techniques such as site directed mutagenesis, randomchemical mutagenesis, exonuclease III deletion, and other recombinantDNA techniques.

The isolated, purified or enriched nucleic acids of SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, the sequencescomplementary thereto, or a fragment comprising at least 10, 15, 20, 25,30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases ofone of the sequence of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56,58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92,94, 96, 98, 100, 102, or the sequences complementary thereto may be usedas probes to identify and isolate DNAs encoding the polypeptides of SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101respectively.

For example, a genomic DNA library may be constructed from a samplemicroorganism or a sample containing a microorganism capable ofproducing an enediyne. The genomic DNA library is then contacted with aprobe comprising a coding sequence or a fragment of the coding sequence,encoding one of the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13,15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49,51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85,87, 89, 91, 93, 95, 97, 99, 101, or a fragment thereof under conditionswhich permit the probe to specifically hybridize to sequencescomplementary thereto. In one embodiment, the probe is anoligonucleotide of about 10 to about 30 nucleotides in length designedbased on a nucleic acid of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18,20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54,56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90,92, 94, 96, 98, 100, 102. Genomic DNA clones which hybridize to theprobe are then detected and isolated. Procedures for preparing andidentifying DNA clones of interest are disclosed in Ausubel et al.,Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997;and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., ColdSpring Harbor Laboratory Press, 1989. In another embodiment, the probeis a restriction fragments or a PCR amplified nucleic acid derived fromSEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68,70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102.

The isolated, purified or enriched nucleic acids of SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, the sequencescomplementary thereto, or a fragment comprising at least 10, 15, 20, 25,30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases ofone of the sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56,58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92,94, 96, 98, 100, 102, or the sequences complementary thereto may be usedas probes to identify and isolate related nucleic acids. In someembodiments, the related nucleic acids may be genomic DNAs (or cDNAs)from potential enediyne producers. In one embodiment, isolated, purifiedor enriched nucleic acids of SEQ ID NOS: 2, 14, 24, 34, 44, 54, 64, 74,84, 94 the sequences complementary thereto, or a fragment comprising atleast 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500consecutive bases of one of the sequences of SEQ ID NOS: 2, 14, 24, 34,44, 54, 64, 74, 84, 94 or the sequences complementary thereto may beused as probes to identify and isolate related nucleic acids. In suchprocedures, a nucleic acid sample containing nucleic acids from apotential enediyne-producer is contacted with the probe under conditionswhich permit the probe to specifically hybridize to related sequences.The nucleic acid sample may be a genomic DNA (or cDNA) library from thepotential enediyne-producer. Hybridization of the probe to nucleic acidsis then detected using any of the methods known in the art, includingthose referred to herein.

Hybridization may be carried out under conditions of low stringency,moderate stringency or high stringency. As an example of nucleic acidhybridization, a polymer membrane containing immobilized denaturednucleic acids is first prehybridized for 30 minutes at 45° C. in asolution consisting of 0.9 M NaCl, 50 mM NaH₂PO₄, pH 7.0, 5.0 mMNa₂EDTA, 0.5% SDS, 10× Denhardt's, and 0.5 mg/ml polyriboadenylic acid.Approximately 2×10⁷ cpm (specific activity 4-9×10⁸ cpm/ug) of ³²pend-labeled oligonucleotide probe are then added to the solution. After12-16 hours of incubation, the membrane is washed for 30 minutes at roomtemperature in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1mM Na₂EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh1×SET at Tm-10 C for the oligonucleotide probe where Tm is the meltingtemperature. The membrane is then exposed to auto-radiographic film fordetection of hybridization signals.

By varying the stringency of the hybridization conditions used toidentify nucleic acids, such as genomic DNAs or cDNAs, which hybridizeto the detectable probe, nucleic acids having different levels ofhomology to the probe can be identified and isolated. Stringency may bevaried by conducting the hybridization at varying temperatures below themelting temperatures of the probes. The melting temperature of the probemay be calculated using the following formulas:

For oligonucleotide probes between 14 and 70 nucleotides in length themelting temperature (Tm) in degrees Celcius may be calculated using theformula: Tm=81.5+16.6(log [Na+])+0.41(fraction G+C)−(600/N) where N isthe length of the oligonucleotide.

If the hybridization is carried out in a solution containing formamide,the melting temperature may be calculated using the equationTm=81.5+16.6(log [Na+])+0.41 (fraction G+C)−(0.63% formamide)−(600/N)where N is the length of the probe.

Prehybridization may be carried out in 6×SSC, 5× Denhardt's reagent,0.5% SDS, 0.1 mg/ml denatured fragmented salmon sperm DNA or 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 0.1 mg/ml denatured fragmented salmonsperm DNA, 50% formamide. The composition of the SSC and Denhardt'ssolutions are listed in Sambrook et al., supra.

Hybridization is conducted by adding the detectable probe to thehybridization solutions listed above. Where the probe comprises doublestranded DNA, it is denatured by incubating at elevated temperatures andquickly cooling before addition to the hybridization solution. It mayalso be desirable to similarly denature single stranded probes toeliminate or diminish formation of secondary structures oroligomerization. The filter is contacted with the hybridization solutionfor a sufficient period of time to allow the probe to hybridize to cDNAsor genomic DNAs containing sequences complementary thereto or homologousthereto. For probes over 200 nucleotides in length, the hybridizationmay be carried out at 15-25° C. below the Tm. For shorter probes, suchas oligonucleotide probes, the hybridization may be conducted at 5-10°C. below the Tm. Preferably, the hybridization is conducted in 6×SSC,for shorter probes. Preferably, the hybridization is conducted in 50%formamide containing solutions, for longer probes.

All the foregoing hybridizations would be considered to be examples ofhybridization performed under conditions of high stringency.

Following hybridization, the filter is washed for at least 15 minutes in2×SSC, 0.1% SDS at room temperature or higher, depending on the desiredstringency. The filter is then washed with 0.1×SSC, 0.5% SDS at roomtemperature (again) for 30 minutes to 1 hour.

Nucleic acids which have hybridized to the probe are identified byautoradiography or other conventional techniques.

The above procedure may be modified to identify nucleic acids havingdecreasing levels of homology to the probe sequence. For example, toobtain nucleic acids of decreasing homology to the detectable probe,less stringent conditions may be used. For example, the hybridizationtemperature may be decreased in increments of 5° C. from 68° C. to 42°C. in a hybridization buffer having a Na+ concentration of approximately1 M. Following hybridization, the filter may be washed with 2×SSC, 0.5%SDS at the temperature of hybridization. These conditions are consideredto be “moderate stringency” conditions above 50° C. and “low stringency”conditions below 50° C. A specific example of “moderate stringency”hybridization conditions is when the above hybridization is conducted at55° C. A specific example of “low stringency” hybridization conditionsis when the above hybridization is conducted at 45° C.

Alternatively, the hybridization may be carried out in buffers, such as6×SSC, containing formamide at a temperature of 42° C. In this case, theconcentration of formamide in the hybridization buffer may be reduced in5% increments from 50% to 0% to identify clones having decreasing levelsof homology to the probe. Following hybridization, the filter may bewashed with 6×SSC, 0.5% SDS at 50° C. These conditions are considered tobe “moderate stringency” conditions above 25% formamide and “lowstringency” conditions below 25% formamide. A specific example of“moderate stringency” hybridization conditions is when the abovehybridization is conducted at 30% formamide. A specific example of “lowstringency” hybridization conditions is when the above hybridization isconducted at 10% formamide.

Nucleic acids which have hybridized to the probe are identified byautoradiography or other conventional techniques.

For example, the preceding methods may be used to isolate nucleic acidshaving a sequence with at least 97%, at least 95%, at least 90%, atleast 85%, at least 80%, or at least 70% homology to a nucleic acidsequence selected from the group consisting of the sequences of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72,74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, fragmentscomprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200,300, 400, or 500 consecutive bases thereof, and the sequencescomplementary thereto. Homology may be measured using BLASTN version 2.0with the default parameters. For example, the homologous polynucleotidesmay have a coding sequence which is a naturally occurring allelicvariant of one of the coding sequences described herein. Such allelicvariant may have a substitution, deletion or addition of one or morenucleotides when compared to the nucleic acids of SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, or the sequencescomplementary thereto.

Additionally, the above procedures may be used to isolate nucleic acidswhich encode polypeptides having at least 99%, 95%, at least 90%, atleast 85%, at least 80%, or at least 70% homology to a polypeptidehaving the sequence of one of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17,19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53,55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89,91, 93, 95, 97, 99, 101, or fragments comprising at least 5, 10, 15, 20,25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof asdetermined using the BLASTP version 2.2.2 algorithm with defaultparameters.

Structural features common to the biosynthesis of all enediyne compoundsrequire one or more proteins selected from a group of 5 specific proteinfamilies, namely PKSE, TEBC, UNBL, UNBV and UNBU. Thus, a polypeptiderepresenting a member of any one of these five protein families or apolynucleotide encoding a polypeptide representing a member of any oneof these five protein families is considered indicative of an enediynegene cluster, a enediyne natural product or an enediyne producingorganism. It is not necessary that a member of each of the five proteinfamilies considered indicative of an enediyne compound be detected toidentify an enediyne biosynthetic locus and an enediyne-producingorganism. Rather, the presence of at least one, preferably two, morepreferably three, still more preferably four, and most preferably fiveof the protein families PKSE, TEBC, UNBV and UNBU indicates the presenceof an enediyne natural product, an enediyne biosynthetic locus or anenediyne producing organism.

To identify an enediyne natural product, an enediyne gene cluster or anenediyne-producing organism, nucleic acids from cultivatedmicroorganisms or from an environmental sample, e.g. soil, potentiallyharboring an organism having the genetic capacity to produce an enediynecompound may be contacted with a probe based on nucleotide sequencescoding a member of the five protein families PKSE, TEBC, UNBL, UNBV andUNBU.

In such procedures, nucleic acids are obtained from cultivatedmicroorganisms or from an environmental sample potentially harboring anorganism having the genetic capacity to produce an enediyne compound.The nucleic acids are contacted with probes designed based on theteachings and compositions of the invention under conditions whichpermit the probe to specifically hybridize to any complementarysequences indicative of the presence of a member of the PKSE, TEBC,UNBL, UNBV and UNBU protein families of the invention. The presence ofat least one, preferably two, more preferably three, still morepreferably 4 or 5 of the PKSE, TEBC, UNBL, UNBV and UNBU proteinfamilies indicates the presence of an enediyne gene cluster or anenediyne producing organism.

Diagnostic nucleic acid sequences encoding members of the PKSE, TEBC,UNBL, UNBV and UNBU protein families for identifying enediyne genes,biosynthetic loci, and microorganisms that harbor such genes or geneclusters may be employed on complex mixtures of microorganisms such asthose from environmental samples (e.g., soil). A mixture ofmicroorganisms refers to a heterogeneous population of microorganismsconsisting of more than one species or strain. In the absence ofamplification outside of its natural habitat, such a mixture ofmicroorganisms is said to be uncultured. A cultured mixture ofmicroorganisms may be obtained by amplification or propagation outsideof its natural habitat by in vitro culture using various growth mediathat provide essential nutrients. However, depending on the growthmedium used, the amplification may preferentially result inamplification of a sub-population of the mixture and hence may not bealways desirable. If desired, a pure culture representing a singlespecies or strain may obtained from either a cultured or unculturedmixture of microorganisms by established microbiological techniques suchas serial dilution followed by growth on solid media so as to isolateindividual colony forming units.

Enediyne biosynthetic genes and/or enediyne biosynthetic gene clustersmay be identified from either a pure culture or cultured or unculturedmixtures of microorganisms employing the diagnostic nucleic acidsequences disclosed in this invention by experimental techniques such asPCR, hybridization, or shotgun sequencing followed by bioinformaticanalysis of the sequence data. The identification of one or more membersof the protein families PKSE, TEBC, UNBL, UNBV and UNBU or enediyne geneclusters including one or more members of the protein families PKSE,TEBC, UNBL, UNBV and UNBU in a pure culture of a single organismdirectly distinguishes such an enediyne-producer. The identification ofone or more members of the protein families PKSE, TEBC, UNBL, UNBV andUNBU or enediyne gene clusters including one or more members of theprotein families PKSE, TEBC, UNBL, UNBV and UNBU in a cultured oruncultured mixture of microorganisms requires further steps to identifyand isolate the microorganism(s) that harbor(s) them so as to obtainpure cultures of such microorganisms.

By way of example, the colony lift technique (Ausubel et al., CurrentProtocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997; andSambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., ColdSpring Harbor Laboratory Press, 1989) may be used to to identifymicroorganisms that harbour enediyne genes and/or enediyne biosyntheticloci from a cultured mixture of microorganisms. In such a procedure, themixture of microorganisms is grown on an appropriate solid medium. Theresulting colony forming units are replicated on a solid matrix such asa nylon membrane. The membrane is contacted with detectable diagnosticnucleic acid sequences, the positive colony forming units areidentified, and the corresponding colony forming units on the originalmedium are identified, purified, and amplified.

Nucleic acids encoding a member of the protein families PKSE, TEBC,UNBL, UNBV and UNBU may be used to survey a number of environmentalsamples for the presence of organisms that have the potential to produceenediyne compounds, i.e., those organisms that contain enediynebiosynthetic genes and/or an enediyne biosynthetic locus. One protocolfor use of a survey to identify polypeptides encoded by DNA isolatedfrom uncultured mixtures of microorganisms is outlined in Seow et al.(1997) J. Bacteriol. Vol. 179 pp. 7360-7368.

Where necessary, conditions which permit the probe to specificallyhybridize to complementary sequences from an enediyne-producer may bedetermined by placing a probe based on a member of the protein familiesPKSE, TEBC, UNBL, UNBV and UNBU in contact with complementary sequencesobtained from an enediyne-producer as well as control sequences whichare not from an enediyne-producer. In some analyses, the controlsequences may be from organisms related to enediyne-producers.Alternatively, the control sequences are not related toenediyne-producers. Hybridization conditions, such as the saltconcentration of the hybridization buffer, the formamide concentrationof the hybridization buffer, or the hybridization temperature, may bevaried to identify conditions which allow the probe to hybridizespecifically to nucleic acids from enediyne-producers.

If the sample contains nucleic acids from enediyne-producers, specifichybridization of the probe to the nucleic acids from theenediyne-producer is then detected. Hybridization may be detected bylabeling the probe with a detectable agent such as a radioactiveisotope, a fluorescent dye or an enzyme capable of catalyzing theformation of a detectable product. Many methods for using the labeledprobes to detect the presence of nucleic acids in a sample are familiarto those skilled in the art. These include Southern Blots, NorthernBlots, colony hybridization procedures, and dot blots.

Another aspect of the present invention is an isolated or purifiedpolypeptide comprising the sequence of one of SEQ ID NOS: 1, 3, 5, 7, 9,11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45,47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81,83, 85, 87, 89, 91, 93, 95, 97, 99, 101 or fragments comprising at least5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive aminoacids thereof. As discussed above, such polypeptides may be obtained byinserting a nucleic acid encoding the polypeptide into a vector suchthat the coding sequence is operably linked to a sequence capable ofdriving the expression of the encoded polypeptide in a suitable hostcell. For example, the expression vector may comprise a promoter, aribosome binding site for translation initiation and a transcriptionterminator. The vector may also include appropriate sequences formodulating expression levels, an origin of replication and a selectablemarker.

Promoters suitable for expressing the polypeptide or fragment thereof inbacteria include the E. coli lac or trp promoters, the lad promoter, thelacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, thelambda P_(R) promoter, the lambda P_(L) promoter, promoters from operonsencoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), andthe acid phosphatase promoter. Fungal promoters include the a factorpromoter. Eukaryotic promoters include the CMV immediate early promoter,the HSV thymidine kinase promoter, heat shock promoters, the early andlate SV40 promoter, LTRs from retroviruses, and the mousemetallothionein-I promoter. Other promoters known to control expressionof genes in prokaryotic or eukaryotic cells or their viruses may also beused.

Mammalian expression vectors may also comprise an origin of replication,any necessary ribosome binding sites, a polyadenylation site, splicedonors and acceptor sites, transcriptional termination sequences, and 5′flanking nontranscribed sequences. In some embodiments, DNA sequencesderived from the SV40 splice and polyadenylation sites may be used toprovide the required nontranscribed genetic elements.

Vectors for expressing the polypeptide or fragment thereof in eukaryoticcells may also contain enhancers to increase expression levels.Enhancers are cis-acting elements of DNA, usually from about 10 to about300 bp in length that act on a promoter to increase its transcription.Examples include the SV40 enhancer on the late side of the replicationorigin bp 100 to 270, the cytomegalovirus early promoter enhancer, thepolyoma enhancer on the late side of the replication origin, and theadenovirus enhancers.

In addition, the expression vectors preferably contain one or moreselectable marker genes to permit selection of host cells containing thevector. Examples of selectable markers that may be used include genesencoding dihydrofolate reductase or genes conferring neomycin resistancefor eukaryotic cell culture, genes conferring tetracycline or ampicillinresistance in E. coli, and the S. cerevisiae TRP1 gene.

In some embodiments, the nucleic acid encoding one of the polypeptidesof SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29,31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65,67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101,or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof is assembled in appropriatephase with a leader sequence capable of directing secretion of thetranslated polypeptides or fragments thereof. Optionally, the nucleicacid can encode a fusion polypeptide in which one of the polypeptide ofSEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67,69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101 orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof is fused to heterologouspeptides or polypeptides, such as N-terminal identification peptideswhich impart desired characteristics such as increased stability orsimplified purification or detection.

The appropriate DNA sequence may be inserted into the vector by avariety of procedures. In general, the DNA sequence is ligated to thedesired position in the vector following digestion of the insert and thevector with appropriate restriction endonucleases. Alternatively,appropriate restriction enzyme sites can be engineered into a DNAsequence by PCR. A variety of cloning techniques are disclosed in Ausbelet al. Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc.1997 and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed.,Cold Spring Harbour Laboratory Press, 1989. Such procedures and othersare deemed to be within the scope of those skilled in the art.

The vector may be, for example, in the form of a plasmid, a viralparticle, or a phage. Other vectors include derivatives of chromosomal,nonchromosomal and synthetic DNA sequences, viruses, bacterial plasmids,phage DNA, baculovirus, yeast plasmids, vectors derived fromcombinations of plasmids and phage DNA, viral DNA such as vaccinia,adenovirus, fowl pox virus, and pseudorabies. A variety of cloning andexpression vectors for use with prokaryotic and eukaryotic hosts aredescribed by Sambrook et al., Molecular Cloning: A Laboratory Manual,Second Edition, Cold Spring Harbor, N.Y., (1989).

Particular bacterial vectors which may be used include the commerciallyavailable plasmids comprising genetic elements of the well known cloningvector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala,Sweden), GEM1 (Promega Biotec, Madison, Wis., USA) pQE70, pQE60, pQE-9(Qiagen), pD10, psiX174 pBluescript II KS, pNH8A, pNH16a, pNH18A, pNH46A(Stratagene), ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia),pKK232-8 and pCM7. Particular eukaryotic vectors include pSV2CAT, pOG44,pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, and pSVL (Pharmacia). However,any other vector may be used as long as it is replicable and stable inthe host cell.

The host cell may be any of the host cells familiar to those skilled inthe art, including prokaryotic cells or eukaryotic cells. Asrepresentative examples of appropriate hosts, there may be mentioned:bacteria cells, such as E. coli, Streptomyces lividans, Bacillussubtilis, Salmonella typhimurium and various species within the generaPseudomonas, Streptomyces, and Staphylococcus, fungal cells, such asyeast, insect cells such as Drosophila S2 and Spodoptera Sf9, animalcells such as CHO, COS or Bowes melanoma, and adenoviruses. Theselection of an appropriate host is within the abilities of thoseskilled in the art.

The vector may be introduced into the host cells using any of a varietyof techniques, including electroporation, transformation, transfection,transduction, viral infection, gene guns, or Ti-mediated gene transfer.Where appropriate, the engineered host cells can be cultured inconventional nutrient media modified as appropriate for activatingpromoters, selecting transformants or amplifying the genes of thepresent invention. Following transformation of a suitable host strainand growth of the host strain to an appropriate cell density, theselected promoter may be induced by appropriate means (e.g., temperatureshift or chemical induction) and the cells may be cultured for anadditional period to allow them to produce the desired polypeptide orfragment thereof.

Cells are typically harvested by centrifugation, disrupted by physicalor chemical means, and the resulting crude extract is retained forfurther purification. Microbial cells employed for expression ofproteins can be disrupted by any convenient method, includingfreeze-thaw cycling, sonication, mechanical disruption, or use of celllysing agents. Such methods are well known to those skilled in the art.The expressed polypeptide or fragment thereof can be recovered andpurified from recombinant cell cultures by methods including ammoniumsulfate or ethanol precipitation, acid extraction, anion or cationexchange chromatography, phosphocellulose chromatography, hydrophobicinteraction chromatography, affinity chromatography, hydroxylapatitechromatography and lectin chromatography. Protein refolding steps can beused, as necessary, in completing configuration of the polypeptide. Ifdesired, high performance liquid chromatography (HPLC) can be employedfor final purification steps.

Various mammalian cell culture systems can also be employed to expressrecombinant protein. Examples of mammalian expression systems includethe COS-7 lines of monkey kidney fibroblasts (described by Gluzman,Cell, 23: 175(1981), and other cell lines capable of expressing proteinsfrom a compatible vector, such as the C127, 3T3, CHO, HeLa and BHK celllines.

The constructs in host cells can be used in a conventional manner toproduce the gene product encoded by the recombinant sequence. Dependingupon the host employed in a recombinant production procedure, thepolypeptide produced by host cells containing the vector may beglycosylated or may be non-glycosylated. Polypeptides of the inventionmay or may not also include an initial methionine amino acid residue.

Alternatively, the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13,15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49,51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85,87, 89, 91, 93, 95, 97, 99, 101, or fragments comprising at least 5, 10,15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acidsthereof can be synthetically produced by conventional peptidesynthesizers. In other embodiments, fragments or portions of thepolynucleotides may be employed for producing the correspondingfull-length polypeptide by peptide synthesis; therefore, the fragmentsmay be employed as intermediates for producing the full-lengthpolypeptides.

Cell-free translation systems can also be employed to produce one of thepolypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23,25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59,61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95,97, 99, 101, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35,40, 50, 75, 100, or 150 consecutive amino acids thereof using mRNAstranscribed form a DNA construct comprising a promoter operably linkedto a nucleic acid encoding the polypeptide or fragment thereof. In someembodiments, the DNA construct may be linearized prior to conducting anin vitro transcription reaction. The transcribed mRNA is then incubatedwith an appropriate cell-free translation extract, such as a rabbitreticulocyte extract, to produce the desired polypeptide or fragmentthereof.

The present invention also relates to variants of the polypeptides ofSEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67,69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof. The term “variant” includesderivatives or analogs of these polypeptides. In particular, thevariants may differ in amino acid sequence from the polypeptides of SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, by oneor more substitutions, additions, deletions, fusions and truncations,which may be present in any combination.

The variants may be naturally occurring or created in vitro. Inparticular, such variants may be created using genetic engineeringtechniques such as site directed mutagenesis, random chemicalmutagenesis, Exonuclease III deletion procedures, and standard cloningtechniques. Alternatively, such variants, fragments, analogs, orderivatives may be created using chemical synthesis or modificationprocedures.

Other methods of making variants are also familiar to those skilled inthe art. These include procedures in which nucleic acid sequencesobtained from natural isolates are modified to generate nucleic acidswhich encode polypeptides having characteristics which enhance theirvalue in industrial or laboratory applications. In such procedures, alarge number of variant sequences having one or more nucleotidedifferences with respect to the sequence obtained from the naturalisolate are generated and characterized. Preferably, these nucleotidedifferences result in amino acid changes with respect to thepolypeptides encoded by the nucleic acids from the natural isolates.

For example, variants may be created using error prone PCR. In errorprone PCR, DNA amplification is performed under conditions where thefidelity of the DNA polymerase is low, such that a high rate of pointmutation is obtained along the entire length of the PCR product. Errorprone PCR is described in Leung, D. W., et al., Technique, 1:11-15(1989) and Caldwell, R. C. & Joyce G. F., PCR Methods Applic., 2:28-33(1992). Variants may also be created using site directed mutagenesis togenerate site-specific mutations in any cloned DNA segment of interest.Oligonucleotide mutagenesis is described in Reidhaar-Olson, J. F. andSauer, R. T., Science, 241:53-57 (1988). Variants may also be createdusing directed evolution strategies such as those described in U.S. Pat.Nos. 6,361,974 and 6,372,497.

The variants of the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13,15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49,51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85,87, 89, 91, 93, 95, 97, 99, 101, may be (i) variants in which one ormore of the amino acid residues of the polypeptides of SEQ ID NOS: 1, 3,5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41,43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77,79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, are substituted with aconserved or non-conserved amino acid residue (preferably a conservedamino acid residue) and such substituted amino acid residue may or maynot be one encoded by the genetic code.

Conservative substitutions are those that substitute a given amino acidin a polypeptide by another amino acid of like characteristics.Typically seen as conservative substitutions are the followingreplacements: replacements of an aliphatic amino acid such as Ala, Val,Leu and Ile with another aliphatic amino acid; replacement of a Ser witha Thr or vice versa; replacement of an acidic residue such as Asp or Gluwith another acidic residue; replacement of a residue bearing an amidegroup, such as Asn or Gln, with another residue bearing an amide group;exchange of a basic residue such as Lys or Arg with another basicresidue; and replacement of an aromatic residue such as Phe or Tyr withanother aromatic residue.

Other variants are those in which one or more of the amino acid residuesof the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55,57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91,93, 95, 97, 99, 101 includes a substituent group.

Still other variants are those in which the polypeptide is associatedwith another compound, such as a compound to increase the half-life ofthe polypeptide (for example, polyethylene glycol).

Additional variants are those in which additional amino acids are fusedto the polypeptide, such as leader sequence, a secretory sequence, aproprotein sequence or a sequence which facilitates purification,enrichment, or stabilization of the polypeptide.

In some embodiments, the fragments, derivatives and analogs retain thesame biological function or activity as the polypeptides of SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73,75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101. In otherembodiments, the fragment, derivative or analogue includes a fusedherterologous sequence which facilitates purification, enrichment,detection, stabilization or secretion of the polypeptide that can beenzymatically cleaved, in whole or in part, away from the fragment,derivative or analogue.

Another aspect of the present invention are polypeptides or fragmentsthereof which have at least 70%, at least 80%, at least 85%, at least90%, or more than 95% homology to one of the polypeptides of SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73,75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, or a fragmentcomprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150consecutive amino acids thereof. Homology may be determined using aprogram, such as BLASTP version 2.2.2 with the default parameters, whichaligns the polypeptides or fragments being compared and determines theextent of amino acid identity or similarity between them. It will beappreciated that amino acid “homology” includes conservativesubstitutions such as those described above.

The polypeptides or fragments having homology to one of the polypeptidesof SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29,31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65,67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101,or a fragment comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof may be obtained by isolatingthe nucleic acids encoding them using the techniques described above.

Alternatively, the homologous polypeptides or fragments may be obtainedthrough biochemical enrichment or purification procedures. The sequenceof potentially homologous polypeptides or fragments may be determined byproteolytic digestion, gel electrophoresis and/or microsequencing. Thesequence of the prospective homologous polypeptide or fragment can becompared to one of the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83,85, 87, 89, 91, 93, 95, 97, 99, 101, or a fragment comprising at least5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive aminoacids thereof using a program such as BLASTP version 2.2.2 with thedefault parameters.

The polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21,23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,95, 97, 99, 101, or fragments, derivatives or analogs thereof comprisingat least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutiveamino acids thereof invention may be used in a variety of application.For example, the polypeptides or fragments, derivatives or analogsthereof may be used to biocatalyze biochemical reactions. In particular,the polypeptides of the PKSE family, namely SEQ ID NOS: 1, 13, 23, 33,43, 53, 63, 73, 83, 93 fragments, derivatives or analogs thereof; theTEBC family, namely SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95or fragments, derivatives or analogs thereof, may be used in anycombination, in vitro or in vivo, to direct the synthesis ormodification of an enediyne warhead or a substructure thereof.Polypeptides of the UNBL family, namely SEQ ID NOS: 7, 17, 27, 37, 47,57, 67, 77, 87, 97 or fragments, derivatives or analogs thereof; may beused in vitro or in vivo to direct or aid the synthesis or modificationof an enediyne warhead or a substructure thereof. Polypeptides of theUNBV family, namely SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99 orfragments, derivatives or analogs thereof, may be used in vitro or invivo to direct or aid the synthesis or modification of an enediynewarhead or a substructure thereof. Polypeptides of the UNBU family,namely SEQ ID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101 or fragments,derivatives or analogs thereof may be used in vitro or in vivo to director aid the synthesis or modification of an enediyne warhead or asubstructure thereof.

The polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21,23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,95, 97, 99, 101, or fragments, derivatives or analogues thereofcomprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150consecutive amino acids thereof, may also be used to generate antibodieswhich bind specifically to the polypeptides or fragments, derivatives oranalogues. The antibodies generated from SEQ ID NOS: 1, 3, 5, 7, 9, 11may be used to determine whether a biological sample containsStreptomyces macromyceticus or a related microorganism. The antibodiesgenerated from SEQ ID NOS: 13, 15, 17, 19, 21 may be used to determinewhether a biological sample contains Micromonospora echinospora subsp.calichensis or a related microorganism. The antibodies generated fromSEQ ID NOS: 23, 25, 27, 29, 31 may be used to determine whether abiological sample contains Streptomyces ghanaensis or a relatedmicroorganism. The antibodies generated from SEQ ID NOS: 33, 35, 37, 39,41 may be used to determine whether a biological sample containsStreptomyces carzinostaticus subsp. neocarzinostaticus or a relatedmicroorganism. The antibodies generated from 43, 45, 47, 49, 51 may beused to determine whether a biological sample contains Amycolatopsisorientalis or a related microorganism. The antibodies generated from 53,55, 57, 59, 61 may be used to determine whether a biological samplecontains Kitasatosporia sp. or a related microorganism. The antibodiesgenerated from SEQ ID NOS: 63, 65, 67, 69, 71 may be used to determinewhether a biological sample contains Micromonospora megalomicea or arelated microorganism. The antibodies generated from SEQ ID NOS: 73, 75,77, 79, 81 may be used to determine whether a biological sample containsSaccharothrix aerocolonigenes or a related microorganism. The antibodiesgenerated from SEQ ID NOS: 83, 85, 87, 89, 91 may be used to determinewhether a biological sample contains Streptomyces kaniharaensis or arelated microorganism. The antibodies generated from SEQ ID NOS: 93, 95,97, 99, 101 may be used to determine whether a biological samplecontains Streptomyces citricolor or a related microorganism.

In such procedures, a biological sample is contacted with an antibodycapable of specifically binding to one of the polypeptides of SEQ IDNOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof. The ability of thebiological sample to bind to the antibody is then determined. Forexample, binding may be determined by labeling the antibody with adetectable label such as a fluorescent agent, an enzymatic label, or aradioisotope. Alternatively, binding of the antibody to the sample maybe detected using a secondary antibody having such a detectable labelthereon. A variety of assay protocols may be used to detect the presenceof Micromonospora echinospora subsp. calichensis, Streptomycesghanaensis, Streptomyces carzinostaticus subsp. neocarzinostaticus,Amycolatopsis orientalis, Kitasatosporia sp., Micromonosporamegalomicea, Saccharothrix aerocolonigenes, Streptomyces kaniharaensis,Streptomyces citricoloror the the present of polypeptides related to SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101 in asample. Particular assays include ELISA assays, sandwich assays,radioimmunoassays, and Western Blots. Alternatively, antibodiesgenerated from SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23,25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59,61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95,97, 99, 101 may be used to determine whether a biological samplecontains related polypeptides that may be involved in the biosynthesisof enediyne natural products or other enediyne-like compounds.

Polyclonal antibodies generated against the polypeptides of SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73,75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, or fragmentscomprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150consecutive amino acids thereof can be obtained by direct injection ofthe polypeptides into an animal or by administering the polypeptides toan animal. The antibody so obtained will then bind the polypeptideitself. In this manner, even a sequence encoding only a fragment of thepolypeptide can be used to generate antibodies which may bind to thewhole native polypeptide. Such antibodies can then be used to isolatethe polypeptide from cells expressing that polypeptide.

For preparation of monoclonal antibodies, any technique which providesantibodies produced by continuous cell line cultures can be used.Examples include the hybridoma technique (Kholer and Milstein, 1975,Nature, 256:495-497), the trioma technique, the human B-cell hybridomatechnique (Kozbor et al., 1983, Immunology Today 4:72), and theEBV-hybridoma technique (Cole, et al., 1985, in Monoclonal Antibodiesand Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).

Techniques described for the production of single chain antibodies (U.S.Pat. No. 4,946,778) can be adapted to produce single chain antibodies tothe polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21,23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,95, 97, 99, 101, or fragments comprising at least 5, 10, 15, 20, 25, 30,35, 40, 50, 75, 100, or 150 consecutive amino acids thereof.Alternatively, transgenic mice may be used to express humanizedantibodies to these polypeptides or fragments thereof.

Antibodies generated against the polypeptides of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43,45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79,81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, or fragments comprising atleast 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutiveamino acids thereof may be used in screening for similar polypeptidesfrom a sample containing organisms or cell-free extracts thereof. Insuch techniques, polypeptides from the sample is contacted with theantibodies and those polypeptides which specifically bind the antibodyare detected. Any of the procedures described above may be used todetect antibody binding. One such screening assay is described in“Methods for measuring Cellulase Activities”, Methods in Enzymology, Vol160, pp. 87-116.

As used herein, the term “enediyne-specific nucleic acid codes”encompass the nucleotide sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 12,14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48,50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84,86, 88, 90, 92, 94, 96, 98, 100, 102, fragments of SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, nucleotide sequenceshomologous to SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60,62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96,98, 100, 102, or homologous to fragments of SEQ ID NOS: 2, 4, 6, 8, 10,12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46,48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82,84, 86, 88, 90, 92, 94, 96, 98, 100, 102, and sequences complementary toall of the preceding sequences. The fragments include portions of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72,74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102 comprisingat least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or500 consecutive nucleotides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16,18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88,90, 92, 94, 96, 98, 100, 102. Preferably, the fragments are novelfragments. Homologous sequences and fragments of SEQ ID NOS: 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44,46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80,82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102 refer to a sequence havingat least 99%, 98%, 97%, 96%, 95%, 90%, 80%, 75% or 70% homology to thesesequences. Homology may be determined using any of the computer programsand parameters described herein, including BLASTN and TBLASTX with thedefault parameters. Homologous sequences also include RNA sequences inwhich uridines replace the thymines in the nucleic acid codes of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72,74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102. Thehomologous sequences may be obtained using any of the proceduresdescribed herein or may result from the correction of a sequencingerror. It will be appreciated that the nucleic acid codes of SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74,76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102 can berepresented in the traditional single character format in which G, A, Tand C denote the guanine, adenine, thymine and cytosine bases of thedeoxyribonucleic acid (DNA) sequence respectively, or in which G, A, Uand C denote the guanine, adenine, uracil and cytosine bases of theribonucleic acid (RNA) sequence (see the inside back cover of Stryer,Biochemistry, 3^(rd) edition, W. H. Freeman & Co., New York) or in anyother format which records the identity of the nucleotides in asequence.

“Enediyne-specific polypeptide codes” encompass the polypeptidesequences of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61,63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97,99, 101 which are encoded by the cDNAs of SEQ ID NOS: 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83,85, 87, 89, 91, 93, 95, 97, 99, 101; polypeptide sequences homologous tothe polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21,23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,95, 97, 99, 101, or fragments of any of the preceding sequences.Homologous polypeptide sequences refer to a polypeptide sequence havingat least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75% or 70% homology toone of the polypeptide sequences of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13,15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49,51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85,87, 89, 91, 93, 95, 97, 99, 101. Polypeptide sequence homology may bedetermined using any of the computer programs and parameters describedherein, including BLASTP version 2.2.2 with the default parameters orwith any user-specified parameters. The homologous sequences may beobtained using any of the procedures described herein or may result fromthe correction of a sequencing error. The polypeptide fragments compriseat least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutivepolypeptides of the polypeptides of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13,15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49,51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85,87, 89, 91, 93, 95, 97, 99, 101. Preferably the fragments are novelfragments. It will be appreciated that the polypeptide codes of the SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101 can berepresented in the traditional single character format or three letterformat (see the inside back cover of Stryer, Biochemistry, 3^(rd)edition, W.H. Freeman & Co., New York) or in any other format whichrelates the identity of the polypeptides in a sequence.

A single sequence selected from enediyne-specific nucleic acid codes andenediyne-specific polypeptide codes is sometimes referred to herein as asubject sequence.

It will be readily appreciated by those skilled in the art that theenediyne-specific nucleic acid codes, a subset thereof,enediyne-specific polypeptide codes, a subset thereof, and a subjectsequence can be stored, recorded and manipulated on any medium which canbe read and accessed by a computer. As used herein, the words “recorded”and “stored” refer to a process for storing information on a computermedium. A skilled artisan can readily adopt any of the presently knownmethods for recording information on a computer readable medium togenerate manufactures comprising one or more of the enediyne-specificnucleic acid codes, a subset thereof, enediyne-specific polypeptidecodes, a subset thereof, and a subject sequence.

Computer readable media include magnetically readable media, opticallyreadable media, electronically readable media and magnetic/opticalmedia. For example, the computer readable media may be a hard disk, afloppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD),Random Access Memory (RAM), or Read Only Memory (ROM) as well as othertypes of media known to those skilled in the art.

The enediyne-specific nucleic acid codes, a subset thereof and a subjectsequence may be stored and manipulated in a variety of data processorprograms in a variety of formats. For example, the enediyne-specificnucleic acid codes, a subset thereof, enediyne-specific polypeptidecodes, a subset thereof, and a subject sequence may be stored as ASCIIor text in a word processing file, such as MicrosoftWORD or WORDPERFECTin a variety of database programs familiar to those of skill in the art,such as DB2 or ORACLE. In addition, many computer programs and databasesmay be used as sequence comparers, identifiers or sources of querynucleotide sequences or query polypeptide sequences to be compared tothe enediyne-specific nucleic acid codes, a subset thereof, theenediyne-specific polypeptide codes, a subset thereof, and a subjectsequence.

The following list is intended not to limit the invention but to provideguidance to programs and databases useful with the enediyne-specificnucleic acid codes, a subset thereof, enediyne-specific polypeptidecodes, a subset thereof, and a subject sequence. The program anddatabases which may be used include, but are not limited to: MacPattern(EMBL), DiscoveryBase (Molecular Applications Group), GeneMine(Molecular Applications Group) Look (Molecular Applications Group),MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTNand BLASTX (Altschul et al., J. Mol. Biol. 215:403 (1990)), FASTA(Person and Lipman, Proc. Nalt. Acad. Sci. USA, 85:2444 (1988)), FASTDB(Brutlag et al. Comp. App. Biosci. 6-237-245, 1990), Catalyst (MolecularSimulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.),Cerius².DBAccess (Molecular Simulations Inc.), HypoGen (MolecularSimulations Inc.), Insight II (Molecular Simulations Inc.), Discover(Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix(Molecular Simulations Inc.), DelPhi (Molecular Simulations Inc.),QuanteMM (Molecular Simulations Inc.), Homology (Molecular SimulationsInc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular SimulationsInc.), Quanta/Protein Design (Molecular Simulations Inc.), WetLab(Molecular Simulations Inc.), WetLab Diversity Explorer (MolecularSimulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold(Molecular Simulations Inc.), the MDL Available Chemicals Directorydatabase, the MDL Drug Data Report data base, the ComprehensiveMedicinal Chemistry database, Derwents' World Drug Index database, theBioByteMasterFile database, the Genbank database, and the Gensyqndatabase. Many other programs and databases would be apparent to one ofskill in the art given the present disclosure.

Embodiments of the present invention include systems, particularlycomputer systems that store and manipulate the sequence informationdescribed herein. As used herein, “a computer system”, refers to thehardware components, software components, and data storage componentsused to analyze enediyne-specific nucleic acid codes, a subset thereof,enediyne-specific polypeptide codes, a subset thereof, or a subjectsequence.

Preferably, the computer system is a general purpose system thatcomprises a processor and one or more internal data storage componentsfor storing data, and one or more data retrieving devices for retrievingthe data stored on the data storage components. A skilled artisan canreadily appreciate that any one of the currently available computersystems are suitable.

One example of a computer system is illustrated in FIG. 1. The computersystem of FIG. 4 will includes a number of components connected to acentral system bus 116, including a central processing unit 118 withinternal 118 and/or external cache memory 120, system memory 122,display adapter 102 connected to a monitor 100, network adapter 126which may also be referred to as a network interface, internal modem124, sound adapter 128, IO controller 132 to which may be connected akeyboard 140 and mouse 138, or other suitable input device such as atrackball or tablet, as well as external printer 134, and/or any numberof external devices such as external modems, tape storage drives, ordisk drives. One skilled in the art will readily appreciate that not allcomponents illustrated in FIG. 1 are required to practice the inventionand, likewise, additional components not illustrated in FIG. 1 may bepresent in a computer system contemplated for use with the invention.

One or more host bus adapters 114 may be connected to the system bus116. To host bus adapter 114 may optionally be connected one or morestorage devices such as disk drives 112 (removable or fixed), floppydrives 110, tape drives 108, digital versatile disk DVD drives 106, andcompact disk CD ROM drives 104. The storage devices may operate inread-only mode and/or in read-write mode. The computer system mayoptionally include multiple central processing units 118, or multiplebanks of memory 122.

Arrows 142 in FIG. 1 indicate the interconnection of internal componentsof the computer system. The arrows are illustrative only and do notspecify exact connection architecture.

Software for accessing and processing the reference sequences (such assequence comparison software, analysis software as well as search tools,annotation tools, and modeling tools etc.) may reside in main memory 122during execution.

In one embodiment, the computer system further comprises a sequencecomparison software for comparing the nucleic acid codes of a querysequence stored on a computer readable medium to a subject sequencewhich is also stored on a computer readable medium; or for comparing thepolypeptide code of a query sequence stored on a computer readablemedium to a subject sequence which is also stored on computer readablemedium. A “sequence comparison software” refers to one or more programsthat are implemented on the computer system to compare nucleotidesequences with other nucleotide sequences stored within the data storagemeans. The design of one example of a sequence comparison software isprovided in FIGS. 2A, 2B, 2C and 2D.

The sequence comparison software will typically employ one or morespecialized comparator algorithms. Protein and/or nucleic acid sequencesimilarities may be evaluated using any of the variety of sequencecomparator algorithms and programs known in the art. Such algorithms andprograms include, but are no way limited to, TBLASTN, BLASTN, BLASTP,FASTA, TFASTA, CLUSTAL, HMMER, MAST, or other suitable algorithm knownto those skilled in the art. (Pearson and Lipman, 1988, Proc. Natl.Acad. Sci USA 85(8): 2444-2448; Altschul et al., 1990, J. Mol. Biol.215(3):403-410; Thompson et al., 1994, Nucleic Acids Res.22(2):4673-4680; Higgins et al., 1996, Methods Enzymol. 266:383-402;Altschul et al., 1990, J. Mol. Biol. 215(3):403-410; Altschul et al.,1993, Nature Genetics 3:266-272; Eddy S. R., Bioinformatics 14:755-763,1998; Bailey T L et al, J Steroid Biochem Mol Biol 1997 May;62(1):29-44). One example of a comparator algorithm is illustrated inFIG. 3. Sequence comparator algorithms identified in this specificationare particularly contemplated for use in this aspect of the invention.

The sequence comparison software will typically employ one or morespecialized analyzer algorithms. One example of an analyzer algorithm isillustrated in FIG. 4. Any appropriate analyzer algorithm can be used toevaluate similarities, determined by the comparator algorithm, between aquery sequence and a subject sequence (referred to herein as aquery/subject pair). Based on context specific rules, the annotation ofa subject sequence may be assigned to the query sequence. A skilledartisan can readily determine the selection of an appropriate analyzeralgorithm and appropriate context specific rules. Analyzer algorithmsidentified elsewhere in this specification are particularly contemplatedfor use in this aspect of the invention.

FIGS. 2A, 2B, 2C and 2D together provide a flowchart of one example of asequence comparison software for comparing query sequences to a subjectsequence. The software determines if a gene or set of genes representedby their nucleotide sequence, polypeptide sequence or otherrepresentation (the query sequence) is significantly similar to theenediyne-specific nucleic acid codes, a subset thereof,enediyne-specific polypeptide codes, a subset thereof, of the invention(the subject sequence). The software may be implemented in the C or C++programming language, Java, Perl or other suitable programming languageknown to a person skilled in the art.

Referring to FIG. 2A, the query sequence(s) may be accessed by theprogram by means of input from the user 210, accessing a database 208 oropening a text file 206. The “query initialization process” allows aquery sequence to be accessed and loaded into computer memory 122, orunder control of the program stored on a disk drive 112 or other storagedevice in the form of a query sequence array 216. The query array 216 isone or more query nucleotide or polypeptide sequences accompanied bysome appropriate identifiers.

A dataset is accessed by the program by means of input from the user228, accessing a database 226, or opening a text file 224. The “subjectdata source initialization process” of FIG. 2B refers to the method bywhich a reference dataset containing one or more sequence selected fromthe enediyne-specific nucleic acid codes, a subset thereof,enediyne-specific polypeptide codes, a subset thereof, or a subjectsequence is loaded into computer memory 122, or under control of theprogram stored on a disk drive 112 or other storage device in the formof a subject array 234. The subject array 234 comprises one or moresubject nucleotide or polypeptide sequences accompanied by someappropriate identifiers.

The “comparison subprocess” of FIG. 2C is the process by which thecomparator algorithm 238 is invoked by the software for pairwisecomparisons between query elements in the query sequence array 216, andsubject elements in the subject array 234. The “comparator algorithm” ofFIG. 2C refers to the pairwise comparisons between a query sequence andsubject sequence, i.e. a query/subject pair from their respective arrays216, 234. Comparator algorithm 238 may be any algorithm that acts on aquery/subject pair, including but not limited to homology algorithmssuch as BLAST, Smith Waterman, Fasta, or statisticalrepresentation/probabilistic algorithms such as Markov modelsexemplified by HMMER, or other suitable algorithm known to one skilledin the art. Suitable algorithms would generally require a query/subjectpair as input and return a score (an indication of likeness between thequery and subject), usually through the use of appropriate statisticalmethods such as Karlin Altschul statistics used in BLAST, Forward orViterbi algorithms used in Markov models, or other suitable statisticsknown to those skilled in the art.

The sequence comparison software of FIG. 2C also comprises a means ofanalysis of the results of the pairwise comparisons performed by thecomparator algorithm 238. The “analysis subprocess” of FIG. 2C is aprocess by which the analyzer algorithm 244 is invoked by the software.The “analyzer algorithm” refers to a process by which annotation of asubject is assigned to the query based on query/subject similarity asdetermined by the comparator algorithm 238 according to context-specificrules coded into the program or dynamically loaded at runtime.Context-specific rules are what the program uses to determine if theannotation of the subject can be assigned to the query given the contextof the comparison. These rules allow the software to qualify the overallmeaning of the results of the comparator algorithm 238.

In one embodiment, context-specific rules may state that for a set ofquery sequences to be considered representative of an enediyne locus thecomparator algorithm 238 must determine that the set of query sequencescontain at least one query sequence that shows a statistical similarityto reference sequences corresponding to a nucleic acid sequence code fora polypeptide from two of the groups consisting of: (1) SEQ ID NOS: 1,13, 23, 33, 43, 53, 63, 73, 83, 93 and polypeptides having at least 75%homology to a polypeptide sequence of SEQ ID NOS: 1, 13, 23, 33, 43, 53,63, 73, 83, 93; (2) SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95and polypeptides having at least 75% homology to a polypeptide sequenceof SEQ ID NOS: 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95; (3) SEQ ID NOS:7, 17, 27, 37, 47, 57, 67, 77, 87, 97, and polypeptides having at least75% homology to a polypeptide sequence of SEQ ID NOS: 7, 17, 27, 37, 47,57, 67, 77, 87, 97; (4) SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89,99 and polypeptides having at least 75% homology to a polypeptidesequence of SEQ ID NOS: 9, 19, 29, 39, 49, 59, 69, 79, 89, 99; (5) SEQID NOS: 11, 21, 31, 41, 51, 61, 71, 81, 91, 101 and polypeptides havingat least 75% homology to a polypeptide sequence of SEQ ID NOS: 11, 21,31, 41, 51, 61, 71, 81, 91, 101. Of course preferred context specificrules may specify a wide variety of thresholds for identifyingenediyne-biosynthetic genes or enediyne-producing organisms withoutdeparting from the scope of the invention. Some thresholds contemplatethat at least one query sequence in the set of query sequences show astatistical similarity to the nucleic acid code corresponding to 2 or 3or 4 or 5 of the above 5 groups polypeptides diagnostic of enediynebiosynthetic genes. Other context specific rules set the level ofhomology required in each of the group may be set at 70%, 80%, 85%, 90%,95% or 98% in regards to any one or more of the subject sequences.

In another embodiment context-specific rules may state that for a querysequence to be considered an enediyne polyketide synthase, thecomparator algorithm 238 must determine that the query sequence shows astatistical similarity to subject sequences corresponding to a nucleicacid sequence code for a polypeptide of SEQ ID NOS: 1, 13, 23, 33, 43,53, 63, 73, 83, 93, polypeptides having at least 75% homology to apolypeptide of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93 andfragment comprising at least 500 consecutive amino acids of thepolypeptides of SEQ ID NOS: 1, 13, 23, 33, 43, 53, 63, 73, 83, 93. Ofcourse preferred context specific rules may specify a wide variety ofthresholds for identifying enediyne polyketide synthase proteins withoutdeparting from the scope of the invention. Some context specific rulesset level of homology required of the query sequence at 70%, 80%, 85%,90%, 95% or 98% in regards to the reference sequences.

Thus, the analysis subprocess may be employed in conjunction with anyother context specific rules and may be adapted to suit differentembodiments. The principal function of the analyzer algorithm 244 is toassign meaning or a diagnosis to a query or set of queries based oncontext specific rules that are application specific and may be changedwithout altering the overall role of the analyzer algorithm 244.

Finally the sequence comparison software of FIG. 2 comprises a means ofreturning of the results of the comparisons by the comparator algorithm238 and analyzed by the analyzer algorithm 244 to the user or processthat requested the comparison or comparisons. The “display/reportsubprocess” of FIG. 2D is the process by which the results of thecomparisons by the comparator algorithm 238 and analyses by the analyzeralgorithm 244 are returned to the user or process that requested thecomparison or comparisons. The results 240, 246 may be written to a file252, displayed in some user interface such as a console, customgraphical interface, web interface, or other suitable implementationspecific interface, or uploaded to some database such as a relationaldatabase, or other suitable implementation specific database.

Once the results have been returned to the user or process thatrequested the comparison or comparisons the program exits.

The principle of the sequence comparison software of FIG. 2 is toreceive or load a query or queries, receive or load a reference dataset,then run a pairwise comparison by means of the comparator algorithm 238,then evaluate the results using an analyzer algorithm 244 to arrive at adetermination if the query or queries bear significant similarity to thereference sequences, and finally return the results to the user orcalling program or process.

FIG. 3 is a flow diagram illustrating one embodiment of comparatoralgorithm 238 process in a computer for determining whether twosequences are homologous. The comparator algorithm receives aquery/subject pair for comparison, performs an appropriate comparison,and returns the pair along with a calculated degree of similarity.

Referring to FIG. 3, the comparison is initiated at the beginning ofsequences 304. A match of (x) characters is attempted 306 where (x) is auser specified number. If a match is not found the query sequence isadvanced 316 by one polypeptide with respect to the subject, and if theend of the query has not been reached 318 another match of (x)characters is attempted 306. Thus if no match has been found the queryis incrementally advanced in entirety past the initial position of thesubject, once the end of the query is reached 318, the subject pointeris advanced by 1 polypeptide and the query pointer is set to thebeginning of the query 318. If the end of the subject has been reachedand still no matches have been found a null homology result score isassigned 324 and the algorithm returns the pair of sequences along witha null score to the calling process or program. The algorithm then exits326. If instead a match is found 308, an extension of the matched regionis attempted 310 and the match is analyzed statistically 312. Theextension may be unidirectional or bidirectional. The algorithmcontinues in a loop extending the matched region and computing thehomology score, giving penalties for mismatches taking intoconsideration that given the chemical properties of the polypeptide sidechains not all mismatches are equal. For example a mismatch of a lysinewith an arginine both of which have basic side chains receive a lesserpenalty than a mismatch between lysine and glutamate which has an acidicside chain. The extension loop stops once the accumulated penaltyexceeds some user specified value, or of the end of either sequence isreached 312. The maximal score is stored 314, and the query sequence isadvanced 316 by one polypeptide with respect to the subject, and if theend of the query has not been reached 318 another match of (x)characters is attempted 306. The process continues until the entirelength of the subject has been evaluated for matches to the entirelength of the query. All individual scores and alignments are stored 314by the algorithm and an overall score is computed 324 and stored. Thealgorithm returns the pair of sequences along with local and globalscores to the calling process or program. The algorithm then exits 326.

Comparator algorithm 238 algorithm may be represented in pseudocode asfollows: INPUT: Q[m]: query, m is the length S[n]: subject, n is thelength x: x is the size of a segment START: for each i in [l,n] do foreach j in [l,m] do if ( j + x − 1 ) <= m and ( i + x − 1 ) <= n then ifQ(j, j+x−1) = S(i, i+x−1) then k=l; while Q(j, j+x−l+k ) = S(i, i+x−l+k) do k++ Store highest local homology Compute overall homology scoreReturn local and overall homology scores END.

The comparator algorithm 238 may be written for use on nucleotidesequences, in which case the scoring scheme would be implemented so asto calculate scores and apply penalties based on the chemical nature ofnucleotides. The comparator algorithm 238 may also provide for thepresence of gaps in the scoring method for nucleotide or polypeptidesequences.

BLAST is one implementation of the comparator algorithm 238. HMMER isanother implementation of the comparator algorithm 238 based on Markovmodel analysis. In a HMMER implementation a query sequence would becompared to a mathematical model representative of a subject sequence orsequences rather than using sequence homology.

FIG. 4 is a flow diagram illustrating an analyzer algorithm 244 processfor detecting the presence of an enediyne biosynthetic locus. Theanalyzer algorithm of FIG. 4 may be used in the process by which theannotation of a subject is assigned to the query based on theirsimilarity as determined by the comparator algorithm 238 and accordingto context-specific rules coded into the program or dynamically loadedat runtime. Context sensitive rules are what determines if theannotation of the subject can be assigned to the query given the contextof the comparison. Context specific rules set the thresholds fordetermining the level and quality of similarity that would be acceptedin the process of evaluating matched pairs.

The analyzer algorithm 244 receives as its input an array of pairs thathad been matched by the comparator algorithm 238. The array consists ofat least a query identifier, a subject identifier and the associatedvalue of the measure of their similarity. To determine if a group ofquery sequences includes sequences diagnostic of an enediynebiosynthetic gene cluster, a reference or diagnostic array 406 isgenerated by accessing a data source and retrieving enediyne specificinformation 404 relating to enediyne-specific nucleic acid codes andenediyne-specific polypeptide codes. Diagnostic array 406 consists atleast of subject identifiers and their associated annotation. Annotationmay include reference to the five protein families diagnostic ofenediyne biosynthetic genes clusters, i.e. PKSE, TEBC, UNBL, UNBV andUNBU. Annotation may also include information regarding exclusivepresence in loci of a specific structural class or may includepreviously computed matches to other databases, for example databases ofmotifs.

Once the algorithm has successfully generated or received the twonecessary arrays 402, 406, and holds in memory any context specificrules, each matched pair as determined by the comparator algorithm 238can be evaluated. The algorithm will perform an evaluation 408 of eachmatched pair and based on the context specific rules confirm or fail toconfirm the match as valid 410. In cases of successful confirmation ofthe match 410 the annotation of the subject is assigned to the query.Results of each comparison are stored 412. The loop ends when the end ofthe query/subject array is reached. Once all query/subject pairs havebeen evaluated against enediyne-specific nucleic acid codes andenediyne-specific polypeptide codes, a final determination can be madeif the query set of ORFs represents an enediyne locus 416.

The algorithm then returns the overall diagnosis and an array ofcharacterized query/subject pairs along with supporting evidence to thecalling program or process and then terminates 418.

The analyzer algorithm 244 may be configured to dynamically loaddifferent diagnostic arrays and context specific rules. It may be usedfor example in the comparison of query/subject pairs with diagnosticsubjects for other biosynthetic pathways, such as chromoproteinenediyne-specific nucleic acid codes or non-chromoproteinenediyne-specific polypeptide codes, or other sets of annotatedsubjects.

The present invention will be further described with reference to thefollowing examples; however, it is to be understood that the presentinvention is not limited to such examples.

EXAMPLES Example 1 Identification and Sequencing of the Macromomycin(Auromomycin) Biosynthetic Locus

Macromomycin is a chromoprotein enediyne produced by Streptomycesmacromyceticus (NRRL B-5335). Macromomycin is believed to be aderivative of a larger chromoprotein enediyne compound referred to asauromomycin (Vandre and Montgomery (1982) Biochemistry Vol 21 pp.3343-3352; Yamashita et al. (1979) J. Antibiot. Vol. 32 pp. 330-339).Thus, throughout the specification, reference to macromomycin isintended to encompass the molecules referred to by some authors asauromomycin. Likewise, reference to the biosynthetic locus formacromomycin is intended to encompass the biosynthetic locus thatdirects the synthesis of the molecules some authors have referred to asmacromomycin and auromomycin.

Streptomyces macromyceticus (NRRL B-5335) was obtained from theAgricultural Research Service collection (National Center forAgricultural Utilization Research, 1815 N. University Street, Peoria,Ill. 61604) and cultured using standard microbiological techniques(Kieser et al., supra). The organism was propagated on oatmeal agarmedium at 28 degrees Celsius for several days. For isolation of highmolecular weight genomic DNA, cell mass from three freshly grown, nearconfluent 100 mm petri dishes was used. The cell mass was collected bygentle scraping with a plastic spatula. Residual agar medium was removedby repeated washes with STE buffer (75 mM NaCl; 20 mM Tris-HCl, pH 8.0;25 mM EDTA). High molecular weight DNA was isolated by establishedprotocols (Kieser et al., supra) and its integrity was verified by fieldinversion gel electrophoresis (FIGE) using the preset program number 6of the FIGE MAPPER™ power supply (BIORAD). This high molecular weightgenomic DNA serves for the preparation of a small size fragment genomicsampling library (GSL), i.e., the small insert library, as well as alarge size fragment cluster identification library (CIL), i.e., thelarge insert library. Both libraries contained randomly generated S.macromyceticus genomic DNA fragments and, therefore, are representativeof the entire genome of this organism.

For the generation of the S. macromyceticus GSL library, genomic DNA wasrandomly sheared by sonication. DNA fragments having a size rangebetween 1.5 and 3 kb were fractionated on a agarose gel and isolatedusing standard molecular biology techniques (Sambrook et al., supra).The ends of the obtained DNA fragments were repaired using T4 DNApolymerase (Roche) as described by the supplier. This enzyme creates DNAfragments with blunt ends that can be subsequently cloned into anappropriate vector. The repaired DNA fragments were subcloned into aderivative of pBluescript SK+ vector (Stratagene) which does not allowtranscription of cloned DNA fragments. This vector was selected as itcontains a convenient polylinker region surrounded by sequencescorresponding to universal sequencing primers such as T3, T7, SK, and KS(Stratagene). The unique EcoRV restriction site found in the polylinkerregion was used as it allows insertion of blunt-end DNA fragments.Ligation of the inserts, use of the ligation products to transform E.coli DH10B (Invitrogen) host and selection for recombinant clones wereperformed as previously described (Sambrook et al., supra). Plasmid DNAcarrying the S. macromyceticus genomic DNA fragments was extracted bythe alkaline lysis method (Sambrook et al., supra) and the insert sizeof 1.5 to 3 kb was confirmed by electrophoresis on agarose gels. Usingthis procedure, a library of small size random genomic DNA fragments isgenerated that covers the entire genome of the studied microorganism.The number of individual clones that can be generated is infinite butonly a small number is further analyzed to sample the microorganism'sgenome.

A CIL library was constructed from the S. macromyceticus high molecularweight genomic DNA using the SuperCos-1 cosmid vector (Stratagene™). Thecosmid arms were prepared as specified by the manufacturer. The highmolecular weight DNA was subjected to partial digestion at 37 degreesCelsius with approximately one unit of Sau3Al restriction enzyme (NewEngland Biolabs) per 100 micrograms of DNA in the buffer supplied by themanufacturer. This enzyme generates random fragments of DNA ranging fromthe initial undigested size of the DNA to short fragments of which thelength is dependent upon the frequency of the enzyme DNA recognitionsite in the genome and the extent of the DNA digestion. At varioustimepoints, aliquots of the digestion were transferred to new microfugetubes and the enzyme was inactivated by adding a final concentration of10 mM EDTA and 0.1% SDS. Aliquots judged by FIGE analysis to contain asignificant fraction of DNA in the desired size range (30-50 kb) werepooled, extracted with phenol/chloroform (1:1 vol:vol), and pelletted byethanol precipitation.

The 5′ ends of Sau3Al DNA fragments were dephosphorylated using alkalinephosphatase (Roche) according to the manufacturers specifications at 37degrees Celcius for 30 min. The phosphatase was heat inactivated at 70degrees Celcius for 10 min and the DNA was extracted withphenol/chloroform (1:1 vol:vol), pelletted by ethanol precipitation, andresuspended in sterile water. The dephosphorylated Sau3Al DNA fragmentswere then ligated overnight at room temperature to the SuperCos-1 cosmidarms in a reaction containing approximately four-fold molar excessSuperCos-1 cosmid arms.

The ligation products were packaged using Gigapack® III XL packagingextracts (Stratagene™) according to the manufacturers specifications.The CIL library consisted of 864 isolated cosmid clones in E. coli DH10B(Invitrogen). These clones were picked and inoculated into nine 96-wellmicrotiter plates containing LB broth (per liter of water: 10.0 g NaCl;10.0 g tryptone; 5.0 g yeast extract) which were grown overnight andthen adjusted to contain a final concentration of 25% glycerol. Thesemicrotiter plates were stored at −80 degrees Celcius and served asglycerol stocks of the CIL library. Duplicate microtiter plates werearrayed onto nylon membranes as follows. Cultures grown on microtiterplates were concentrated by pelleting and resuspending in a small volumeof LB broth. A 3×3 96-pin-grid was spotted onto nylon membranes.

The membranes, representing the complete CIL library, were then layeredonto LB agar and incubated ovenight at 37 degrees Celcius to allow thecolonies to grow. The membranes were layered onto filter paperpre-soaked with 0.5 N NaOH/1.5 M NaCl for 10 min to denature the DNA andthen neutralized by transferring onto filter paper pre-soaked with 0.5 MTris (pH 8)/1.5 M NaCl for 10 min. Cell debris was gently scraped offwith a plastic spatula and the DNA was crosslinked onto the membranes byUV irradiation using a GS GENE LINKER™ UV Chamber (BIORAD). Consideringan average size of 8 Mb for an actinomycete genome and an average sizeof 35 kb of genomic insert in the CIL library, this library representsroughly a 4-fold coverage of the microorganism's entire genome.

The GSL library was analyzed by sequence determination of the clonedgenomic DNA inserts. The universal primers KS or T7, referred to asforward (F) primers, were used to initiate polymerization of labeledDNA. Extension of at least 700 bp from the priming site can be routinelyachieved using the TF, BDT v2.0 sequencing kit as specified by thesupplier (Applied Biosystems). Sequence analysis of the small genomicDNA fragments (Genomic Sequence Tags, GSTs) was performed using a 3700ABI capillary electrophoresis DNA sequencer (Applied Biosystems). Theaverage length of the DNA sequence reads was ˜700 bp. Further analysisof the obtained GSTs was performed by sequence homology comparison tovarious protein sequence databases. The DNA sequences of the obtainedGSTs were translated into amino acid sequences and compared to theNational Center for Biotechnology Information (NCBI) nonredundantprotein database and the proprietary Ecopia natural product biosyntheticgene Decipher™ database using previously described algorithms (Altschulet al., supra). Sequence similarity with known proteins of definedfunction in the database enables one to make predictions on the functionof the partial protein that is encoded by the translated GST.

A total of 479 S. macromyceticus GSTs obtained with the forwardsequencing primer were analyzed by sequence comparison using the Blastalgorithm (Altschul et al., supra). Sequence alignments displaying an Evalue of at least e-5 were considered as significantly homologous andretained for further evaluation. GSTs showing similarity to a gene ofinterest can be at this point selected and used to identify largersegments of genomic DNA from the CIL library that include the gene(s) ofinterest. Several S. macromyceticus GSTs that contained genes ofinterest were pursued. One of these GSTs encoded a portion of anoxidoreductase based on Blast analysis of the forward read and a portionof the macromomycin apoprotein based on Blast analysis of the reverseread. Oligonucleotide probes derived from such GSTs were used to screenthe CIL library and the resulting positive cosmid clones were sequenced.Overlapping cosmid clones provided in excess of 125 kb of sequenceinformation surrounding the macromomycin apoprotein gene (FIG. 5).

Hybridization oligonucleotide probes were radiolabeled with P³² using T4polynucleotide kinase (New England Biolabs) in 15 microliter reactionscontaining 5 picomoles of oligonucleotide and 6.6 picomoles of[γ-P³²]ATP in the kinase reaction buffer supplied by the manufacturer.After 1 hour at 37 degrees Celcius, the kinase reaction was terminatedby the addition of EDTA to a final concentration of 5 mM. The specificactivity of the radiolabeled oligonucleotide probes was estimated usinga Model 3 Geiger counter (Ludlum Measurements Inc., Sweetwater, Tex.)with a built-in integrator feature. The radiolabeled oligonucleotideprobes were heat-denatured by incubation at 85 degrees Celcius for 10minutes and quick-cooled in an ice bath immediately prior to use.

The S. macromyceticus CIL library membranes were pretreated byincubation for at least 2 hours at 42 degrees Celcius in Prehyb Solution(6×SSC; 20 mM NaH₂PO₄; 5× Denhardt's; 0.4% SDS; 0.1 mg/ml sonicated,denatured salmon sperm DNA) using a hybridization oven with gentlerotation. The membranes were then placed in Hyb Solution (6×SSC; 20 mMNaH₂PO₄; 0.4% SDS; 0.1 mg/ml sonicated, denatured salmon sperm DNA)containing 1×10⁶ cpm/ml of radiolabeled oligonucleotide probe andincubated overnight at 42 degrees Celcius using a hybridization ovenwith gentle rotation. The next day, the membranes were washed with WashBuffer (6×SSC, 0.1% SDS) for 45 minutes each at 46, 48, and 50 degreesCelcius using a hybridization oven with gentle rotation. The S.macromyceticus CIL membranes were then exposed to X-ray film tovisualize and identify the positive cosmid clones. Positive clones wereidentified, cosmid DNA was extracted from 30 ml cultures using thealkaline lysis method (Sambrook et al., supra) and the inserts wereentirely sequenced using a shotgun sequencing approach (Fleischmann etal., (1995) Science, 269:496-512).

Sequencing reads were assembled using the Phred-Phrap™ algorithm(University of Washington, Seattle, USA) recreating the entire DNAsequence of the cosmid insert. Reiterations of hybridizations of the CILlibrary with probes derived from the ends of the original cosmid allowindefinite extension of sequence information on both sides of theoriginal cosmid sequence until the complete sought-after gene cluster isobtained. The structure of macromomycin (auromomycin) has not beenelucidated, however the apoprotein component has been well characterized(Van Roey and Beerman (1989) Proc Natl Acad Sci USA Vol. 86 pp.6587-6591). An unusual polyketide synthase (PKSE) was foundapproximately 40 kb upstream of the macromomycin apoprotein gene (FIG.5). No other polyketide synthase or fatty acid synthase gene cluster wasfound in the vicinity of the macromomycin apoprotein gene, suggestingthat the PKSE may be the only polyketide synthase involved in thebiosynthesis of macromomycin (auromomycin).

Four other enediyne-specific genes clustered with or in close proximityto the PKSE gene were found in the macromomycin biosynthetic locus.These genes and the polypeptides that they encode have been assigned thefamily designations TEBC, UNBL, UNBV, and UNBU. The macromomycin locuscontains two copies of the TEBC gene (FIG. 6, Table 2). Table 2 liststhe results of sequence comparison using the Blast algorithm (Altschulet al., supra) for each of these enediyne-specific polypeptides from themacromomycin locus. Homology was determined using the BLASTP algorithmwith the default parameters. TABLE 2 MACR locus GenBank homology Family#aa Accession, #aa probability identity similarity proposed function ofGenBank match PKSE 1936 T37056, 2082aa 6e−86 273/897 (30.43%) 372/897(41.47%) multi-domain beta keto-acyl synthase, Streptomyces coelicolorNP_485686.1, 1263aa 5e−82 256/900 (28.44%) 388/900 (43.11%) heterocystglycolipid synthase, Nostoc sp. AAL01060.1, 2573aa 6e−78 244/884 (27.6%)376/884 (42.53%) polyunsaturated fatty acid synthase, Photobacteriumprofundum TEBC1 162 NP_249659.1, 148aa 4e−06 38/134 (28.36%) 59/134(44.03%) hypothetical protein, Pseudomonas aeruginosa CAB50777.1, 150aa4e−06 39/145 (26.9%) 65/145 (44.83%) hypothetical protein, Pseudomonasputida NP_214031.1, 128aa 2e−04 33/129 (25.58%) 55/129 (42.64%)hypothetical protein, Aquifex aeolicus TEBC2 157 NP_242865.1, 138aa 0.2731/131 (23%) 50/131 (37%) 4-hydroxybenzoyl-CoA thioesterase, Bacillushalodurans UNBL 327 NP_422192.1, 423aa 0.095 30/86 (34.88%) 40/86(46.51%) peptidase, Caulobacter crescentus UNBV 642 NO HOMOLOG UNBU 433NP_486037.1, 300aa 1e−06 49/179 (27.37%) 83/179 (46.37%) hypotheticalprotein, Nostoc sp. NP_107088.1, 503aa 2e−04 72/280 (25.71%) 126/280(45%) hypothetical protein, Mesorhizobium loti NP_440874.1, 285aa 4e−0447/193 (24.35%) 86/193 (44.56%) hypothetical protein, Synechocystis sp.

The macromomycin genes listed in Table 2 are arranged as depicted inFIG. 6. The UNBL, UNBV, UNBU, PKSE, and TEBC1 genes span approximately10.5 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute an operon. A second TEBC gene (TEBC2) is foundapproximately 6.6 kb downstream of the 5-gene enediyne-specificcassette. The macromomycin enediyne-specific cassette is composed of sixfunctionally linked genes and polypeptides, five of which may beexpressed as a single operon.

Example 2 Identification and Sequencing of the CalicheamicinBiosynthetic Locus

Calicheamicin is a non-chromoprotein enediyne produced by Micromonosporaechinospora subsp. calichensis NRRL 15839. Both GSL and CIL genomic DNAlibraries of M. echinospora genomic DNA were prepared as described inExample 1. A total of 288 GSL clones were sequenced with the forwardprimer and analyzed by sequence comparison using the Blast algorithm(Altschul et al., supra) to identify those clones that contained insertsrelated to the macromomycin (auromomycin) biosynthetic genes,particularly the PKSE. Such GST clones were identified and were used toisolate cosmid clones from the M. echinospora CIL library. Overlappingcosmid clones were sequenced and assembled as described in Example 1.The resulting DNA sequence information was more than 125 kb in lengthand included the calicheamicin genes described in WO 00/37608. Thecalicheamicin biosynthetic genes disclosed in WO 00/37608 span only from37140 bp to 59774 bp in FIG. 5 and do not include the unusual PKS gene(PKSE) and four other flanking genes (UNBL, UNBV, UNBU, and TEBC) thatare homologuous to those in the macromomycin biosynthetic locus. Table 3lists the results of sequence comparison using the Blast algorithm(Altschul et al., supra) for each of these enediyne-specificpolypeptides from the calicheamicin locus. Homology was determined usingthe BLASTP algorithm with the default parameters. TABLE 3 CALI locusGenBank homology Family #aa Accession, #aa probability identitysimilarity proposed function of GenBank match PKSE 1919 AAF26923.1,2439aa 1e−60 228/876 (26.03%) 317/876 (36.19%) polyketide synthase,Polyangium cellulosum NP_485686.1, 1263aa 5e−59 148/461 (32.1%) 210/461(45.55%) heterocyst glycolipid synthase, Nostoc sp. T37056, 2082aa 9e−58161/466 (34.55%) 213/466 (45.71%) multi-domain beta keto-acyl synthase,Streptomyces coelicolor TEBC 148 NP_249659.1, 148aa 8e−06 41/133(30.83%) 62/133 (46.62%) hypothetical protein, Pseudomonas aeruginosaAAD49752.1, 148aa 1e−05 41/138 (29.71%) 63/138 (45.65%) orf1,Pseudomonas aeruginosa NP_242865.1, 138aa 2e−04 32/130 (24.62%) 56/130(43.08%) 4-hydroxybenzoyl-CoA thioesterase, Bacillus halodurans UNBL 322NO HOMOLOG UNBV 651 NO HOMOLOG UNBU 321 NP_486037.1, 300aa 8e−09 61/210(29.05%) 99/210 (47.14%) hypothetical protein, Nostoc sp. NP_107088.1,503aa 5e−05 58/208 (27.88%) 96/208 (46.15%) hypothetical protein,Mesorhizobium loti

The calicheamicin genes listed in Table 3 are arranged as depicted inFIG. 6. The UNBL, UNBV, UNBU, PKSE, and TEBC genes span approximately10.5 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute an operon. Therefore, the calicheamicinenediyne-specific cassette is composed of five functionally linked genesand polypeptides that may be expressed as a single operon.

Example 3 Identification and Sequencing of the Biosynthetic Locus for anUnknown Chromoprotein Enediyne in Streptomyces ghanaensis

The genomic sampling method described in Example 1 was applied togenomic DNA from Streptomyces ghanaensis NRRL B-12104. S. ghanaensis hasnot previously been described to produce enediyne compounds. Both GSLand CIL genomic DNA libraries of S. ghanaensis genomic DNA were preparedas described in Example 1. A total of 435 GSL clones were sequenced withthe forward primer and analyzed by sequence comparison using the Blastalgorithm (Altschul et al., supra).

Surprisingly, two GSTs from S. ghanaensis were identified as encodingportions of genes in the 5-gene cassette common to both the macromomycinand calicheamicin enediyne biosynthetic loci. One of these GSTs encodeda portion of a TEBC homologue and the other encoded a portion of a UNBVhomologue. These S. ghanaensis GSTs were subsequently found in a geneticlocus referred to herein as 009C (FIG. 5). As in the macromomycin andcalicheamicin enediyne biosynthetic loci, the UNBV and TEBC genes in009C were found to flank a PKSE gene and adjacent to UNBL and UNBUgenes. The 009C locus included a gene encoding a homologue of themacromomycin apoprotein approximately 50 kb downstream of theUNBV-UNBU-UNBL-PKSE-TEBC cassette. The presence of the 5-gene cassettein the vicinity of an apoprotein suggests that 009C represents abiosynthetic locus for an unknown chromoprotein enediyne that was notpreviously described to be produced by S. ghanaensis NRRL B-12104.

Table 4 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of these enediyne-specificpolypeptides from the 009C locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 4 009C locus GenBankhomology Family #aa Accession, #aa probability identity similarityproposed function of GenBank match PKSE 1956 T37056, 2082aa  1e−101298/902 (33.04%) 395/902 (43.79%) multi-domain beta keto-acyl synthase,Streptomyces coelicolor NP_485686.1, 1263aa 2e−99 274/900 (30.44%)407/900 (45.22%) heterocyst glycolipid synthase, Nostoc sp. BAB69208.1,2365aa 3e−89 282/880 (32.05%) 366/880 (41.59%) polyketide synthase,Streptomyces avermitilis TEBC 152 NP_249659.1, 148aa 5e−07 39/131(29.77%) 59/131 (45.04%) hypothetical protein, Pseudomonas aeruginosaNP_231474.1, 155aa 2e−04 30/129 (23.26%) 62/129 (48.06%) hypotheticalprotein, Vibrio cholerae NP_214031.1, 128aa 2e−04 31/128 (24.22%) 55/128(42.97%) hypothetical protein, Aquifex aeolicus UNBL 329 NO HOMOLOG UNBV636 NP_615809.1, 2275aa 6e−05 72/314 (22.93%) 114/314 (36.31%) cellsurface protein, Methanosarcina acetivorans UNBU 382 NP_486037.1, 300aa4e−07 46/175 (26.29%) 81/175 (46.29%) hypothetical protein, Nostoc sp.NP_107088.1, 503aa 6e−06 68/255 (26.67%) 118/255 (46.27%) hypotheticalprotein, Mesorhizobium loti

The 009C genes listed in Table 4 are arranged as depicted in FIG. 6. TheUNBL, UNBV, UNBU, PKSE, and TEBC genes span approximately 10.5 kb andare tandemly arranged in the order listed. These five genes mayconstitute an operon. Therefore, the 009C enediyne-specific cassette iscomposed of five functionally linked genes and polypeptides that may beexpressed as a single operon.

Example 4 The 5-Gene Enediyne Cassette is Present in theNeocarzinostatin Biosynthetic Locus

Neocarzinostatin is a chromoprotein enediyne produced by Streptomycescarzinostaticus subsp. neocarzinostaticus ATCC 15944. Theneocarzinostatin biosynthetic locus was sequenced and was shown tocontain, in addition to the neocarzinostatin apoprotein gene, the 5-genecassette that is present in the macromomycin and calicheamicin enediynebiosynthetic loci. The genes and proteins involved in the biosynthesisof neocarzinostatin are disclosed in co-pending application U.S. Ser.No. 60/354,474. The presence of the 5-gene cassette in theneocarzinostatin biosynthetic locus reconfirms that it is present in allenediyne biosynthetic loci.

Table 5 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of these enediyne-specificpolypeptides from the neocarzinostatin locus. Homology was determinedusing the BLASTP algorithm with the default parameters. TABLE 5 NEOClocus GenBank homology Family #aa Accession, #aa probability identitysimilarity proposed function of GenBank match PKSE 1977 T37056, 2082aa7e−93 285/891 (31.99%) 384/891 (43.1%) multi-domain beta keto-acylsynthase, Streptomyces coelicolor NP_485686.1, 1263aa 8e−88 261/890(29.33%) 397/890 (44.61%) heterocyst glycolipid synthase, Nostoc sp.BAB69208.1, 2365aa 2e−85 276/876 (31.51%) 370/876 (42.24%) polyketidesynthase, Streptomyces avermitilis TEBC 153 NP_249659.1, 148aa 3e−0637/129 (28.68%) 56/129 (43.41%) hypothetical protein, Pseudomonasaeruginosa CAB50777.1, 150aa 1e−04 32/114 (28.07%) 53/114 (46.49%)hypothetical protein, Pseudomonas putida NP_214031.1, 128aa 2e−04 34/129(26.36%) 55/129 (42.64%) hypothetical protein, Aquifex aeolicus UNBL 328UNBV 636 NP_618575.1, 1881aa 2e−05 77/317 (24.29%) 117/317 (36.91%) cellsurface protein, Methanosarcina acetivorans UNBU 364 NP_107088.1, 503aa2e−05 49/158 (31.01%) 79/158 (50%) hypothetical protein, Mesorhizobiumloti NP_486037.1, 300aa 8e−05 33/126 (26.19%) 60/126 (47.62%)hypothetical protein, Nostoc sp.

The neocarzinostatin genes listed in Table 5 are arranged as depicted inFIG. 6. The UNBL, UNBV, UNBU, PKSE, and TEBC genes span approximately10.5 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute an operon. Therefore, the neocarzinostatinenediyne-specific cassette is composed of five functionally linked genesand polypeptides that may be expressed as a single operon.

Example 5 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Chromoprotein Enediyne in Amycolatopsis orientalis

The genomic sampling method described in Example 1 was applied togenomic DNA from Amycolatopsis orientalis ATCC 43491. A. orientalis hasnot previously been described to produce enediyne compounds. Both GSLand CIL genomic DNA libraries of A. orientalis genomic DNA were preparedas described in Example 1.

A total of 1025 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Several secondary metabolism loci were identified andsequenced as described in Example 1. One of these loci (herein referredto as 007A) includes a 5-gene cassette common to all enediynebiosynthetic loci. The arrangement of the five genes of the cassette in007A is shown in FIG. 6. Interestingly, the A. orientalis genome alsocontains an enediyne apoprotein gene that is similar to that from themacromomycin and 009C loci as well as other chromoprotein enediynes(data not shown). Therefore, A. orientalis, the producer of thewell-known glycopeptide antibiotic vancomycin, has the genomic potentialto produce a chromoprotein enediyne.

Table 6 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of the enediyne-specificpolypeptides from the 007A locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 6 007A locus GenBankhomology Family #aa Accession, #aa probability identity similarityproposed function of GenBank match PKSE 1939 T37056, 2082aa 5e−96291/906 (32.12%) 399/906 (44.04%) multi-domain beta keto-acyl synthase,Streptomyces coelicolor NP_485686.1, 1263aa 9e−87 255/897 (28.43%)395/897 (44.04%) heterocyst glycolipid synthase, Nostoc sp. BAB69208.1,2365aa 8e−86 285/926 (30.78%) 393/926 (42.44%) modular polyketidesynthase, Streptomyces avermitilis TEBC 146 NP_214031.1, 128aa 0.05228/124 (22.58%) 51/124 (41.13%) hypothetical protein, Aquifex aeolicusUNBL 324 NO HOMOLOG UNBV 654 NP_618575.1, 1881aa 0.001 80/332 (24.1%)117/332 (35.24%) cell surface protein, Methanosarcina acetivorans UNBU329 NP_486037.1, 300aa 0.005 56/245 (22.86%) 96/245 (39.18%)hypothetical protein, Nostoc sp.

The 007A genes listed in Table 6 are arranged as depicted in FIG. 6. TheUNBL, UNBV, and UNBU genes span approximately 4 kb and are tandemlyarranged in the order listed. The PKSE and TEBC genes span approximately6.5 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute two operons. The two putative operons are separatedby approximately 5 kb. Although these two clusters of genes may not betranscriptionally linked to one another, they are still functionallylinked. Therefore, the 007A enediyne-specific cassette is composed offive functionally linked genes and polypeptides, three of which may beexpressed as a one operon and two of which may be expressed as a secondoperon.

Example 6 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Enediyne in kitasatosporia sp. CECT 4991

The genomic sampling method described in Example 1 was applied togenomic DNA from Kitasatosporia sp. CECT 4991. This organism was notpreviously described to produce enediyne compounds. Both GSL and CILgenomic DNA libraries of Kitasatosporia sp. genomic DNA were prepared asdescribed in Example 1.

A total of 1390 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Surprisingly, two GSTs from Kitasatosporia sp. wereidentified as encoding portions of genes in the 5-gene cassette commonto enediyne biosynthetic loci. One of these GSTs encoded a portion of aPKSE homologue and the other encoded a portion of a UNBV homologue.These Kitasatosporia sp. GSTs were subsequently found in a genetic locusreferred to herein as 028D which includes a 5-gene cassette common toall enediyne biosynthetic loci. The arrangement of the five genes of thecassette in 028D is shown in FIG. 6. Therefore, Kitasatosporia sp. CECT4991 has the genomic potential to produce enediyne compound(s).

Table 7 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of the enediyne-specificpolypeptides from the 028D locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 7 028D locus GenBankhomology Family #aa Accession, #aa probability identity similarityproposed function of GenBank match PKSE 1958 BAB69208.1, 2365aa 1e−81273/926 (29.48%) 354/926 (38.23%) polyketide synthase, Streptomycesavermitilis T37056, 2082aa 3e−78 263/895 (29.39%) 356/895 (39.78%)multi-domain beta keto-acyl synthase, Streptomyces coelicolorNP_485686.1, 1263aa 7e−71 231/875 (26.4%) 345/875 (39.43%) heterocystglycolipid synthase, Nostoc sp. TEBC 158 NP_249659.1, 148aa 1e−04 38/133(28.57%) 61/133 (45.86%) hypothetical protein, Pseudomonas aeruginosaAAD49752.1, 148aa 3e−04 38/138 (27.54%) 62/138 (44.93%) orf1,Pseudomonas aeruginosa NP_231474.1, 155aa 7e−04 31/127 (24.41%) 61/127(48.03%) hypothetical protein, Vibrio cholerae UNBL 327 NO HOMOLOG UNBV676 NO HOMOLOG UNBU 338 NP_486037.1, 300aa 5e−08 66/240 (27.5%) 105/240(43.75%) hypothetical protein, Nostoc sp. NP_440874.1, 285aa 2e−0451/190 (26.84%) 98/190 (51.58%) hypothetical protein, Synechocystis sp.

The 028D genes listed in Table 7 are arranged as depicted in FIG. 6. TheUNBV, UNBU, PKSE, and TEBC genes span approximately 9.5 kb and aretandemly arranged in the order listed. Thus these four genes mayconstitute an operon. This putative operon is separated from the UNBLgene, which is oriented in the opposite direction relative to theputative operon, by approximately 10.5 kb. Although the UNBL gene cannotbe transcriptionally linked to the other genes, it is still functionallylinked to the former. Therefore, the 028D enediyne-specific cassette iscomposed of five functionally linked genes and polypeptides, four ofwhich may be expressed as a single operon. Although expression offunctionally linked enediyne-specific genes may be under control ofdistinct transcriptional promoters they may, nonetheless, be expressedin a concerted fashion. As depicted in FIG. 6, the 028D biosyntheticlocus is unique in that it is the only example whose enedlyne-specificgenes are not all oriented in the same direction.

Example 7 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Enediyne in Micromonospora megalomicea

The genomic sampling method described in Example 1 was applied togenomic DNA from Micromonospora megalomicea NRRL 3275. This organism wasnot previously described to produce enediyne compounds. Both GSL and CILgenomic DNA libraries of M. megalomicea genomic DNA were prepared asdescribed in Example 1.

A total of 1390 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Surprisingly, one GST from M. megalomicea was identified asencoding a portion of the PKSE gene present in the 5-gene cassettecommon to biosynthetic loci. The forward read of this GST encoded theC-terminal portion of the KS domain and the N-terminal portion of the ATdomain of a PKSE gene. The complement of the reverse read of this GSTencoded the C-terminal portion of the AT domain of a PKSE gene. This M.megalomicea GST was subsequently found in a genetic locus referred toherein as 054A which includes a 5-gene cassette common to all enediynebiosynthetic loci. The arrangement of the five genes of the cassette in054A is shown in FIG. 6. Therefore, M. megalomicea has the genomicpotential to produce enediyne compound(s).

Table 8 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of the enediyne-specificpolypeptides from the 054A locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 8 054A locus GenBankhomology Family #aa Accession, #aa probability identity similarityproposed function of GenBank match PKSE 1927 NP_485686.1, 1263aa 3e−76247/886 (27.88%) 365/886 (41.2%) heterocyst glycolipid synthase, Nostocsp. T37056, 2082aa 3e−75 269/903 (29.79%) 354/903 (39.2%) multi-domainbeta keto-acyl synthase, Streptomyces coelicolor BAB69208.1, 2365aa9e−74 277/923 (30.01%) 359/923 (38.89%) polyketide synthase,Streptomyces avermitilis TEBC 154 NP_249659.1, 148aa 2e−06 43/147(29.25%) 66/147 (44.9%) hypothetical protein, Pseudomonas aeruginosaAAD49752.1, 148aa 2e−05 42/147 (28.57%) 65/147 (44.22%) orf1,Pseudomonas aeruginosa CAB50777.1, 150aa 1e−04 40/139 (28.78%) 61/139(43.88%) hypothetical protein, Pseudomonas putida UNBL 322 NO HOMOLOGUNBV 659 CAC44518.1, 706aa 0.048 50/166 (30.12%) 67/166 (40.36%)putative secreted esterase, Streptomyces coelicolor UNBU 354NP_486037.1, 300aa 5e−06 66/268 (24.63%) 118/268 (44.03%) hypotheticalprotein, Nostoc sp.

The 054A genes listed in Table 8 are arranged as depicted in FIG. 6. TheUNBL, PKSE, and TEBC genes span approximately 7.5 kb and are tandemlyarranged in the order listed. The UNBV and UNBU genes span approximately3 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute two operons. The two putative operons are separatedby approximately 2 kb. Therefore, the 054A enediyne-specific cassette iscomposed of five functionally linked genes and polypeptides, three ofwhich may be expressed as a one operon and two of which may be expressedas another operon.

Example 8 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Enediyne in Saccharothrix aerocolonigenes

The genomic sampling method described in Example 1 was applied togenomic DNA from Saccharothrix aerocolonigenes ATCC 39243. This organismwas not previously described to produce enediyne compounds. Both GSL andCIL genomic DNA libraries of Saccharothrix aerocolonigenes genomic DNAwere prepared as described in Example 1.

A total of 513 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Several secondary metabolism loci were identified andsequenced as described in Example 1. One of these loci (herein referredto as 132H) includes a 5-gene cassette common to all enediynebiosynthetic loci. The arrangement of the five genes of the cassette in132H is shown in FIG. 6. Therefore, Saccharothrix aerocolonigenes hasthe genomic potential to produce enediyne compound(s).

Table 9 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of these enediyne-specificpolypeptides from the 132H locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 9 132H locus GenBankhomology Family #aa Accession, #aa probability identity similarityproposed function of GenBank match PKSE 1892 BAB69208.1, 2365aa  1e−108312/872 (35.78%) 404/872 (46.33%) polyketide synthase, Streptomycesavermitilis T37056, 2082aa  1e−101 290/886 (32.73%) 407/886 (45.94%)multi-domain beta keto-acyl synthase, Streptomyces coelicolor T30183,2756aa 4e−94 271/886 (30.59%) 398/886 (44.92%) hypothetical protein,Shewanella sp TEBC 143 NP_442358.1, 138aa 0.001 32/127 (25.2%) 48/127(37.8%) hypothetical protein, Synechocystis sp. UNBL 313 NO HOMOLOG UNBV647 AAD34550.1, 1529aa 0.012 76/304 (25%) 105/304 (34.54%) esterase,Aspergillus terreus UNBU 336 NP_486037.1, 300aa 1e−04 42/172 (24.42%)79/172 (45.93%) hypothetical protein, Nostoc sp. NP_440874.1, 285aa1e−04 48/181 (26.52%) 90/181 (49.72%) hypothetical protein,Synechocystis sp.

The 132H genes listed in Table 9 are arranged as depicted in FIG. 6. TheUNBL, UNBV, UNBU, PKSE, and TEBC genes span approximately 10.5 kb andare tandemly arranged in the order listed. Thus, these five genes mayconstitute an operon. Therefore, the 132H enediyne-specific cassette iscomposed of five functionally linked genes and polypeptides that may beexpressed as a single operon.

Example 9 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Enediyne in Streptomyces kaniharaensis

The genomic sampling method described in Example 1 was applied togenomic DNA from Streptomyces kaniharaensis ATCC 21070. This organismwas not previously described to produce enediyne compounds. Both GSL andCIL genomic DNA libraries of S. kaniharaensis genomic DNA were preparedas described in Example 1.

A total of 1020 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Surprisingly, one GST from S. kaniharaensis was identifiedas encoding a portion of the PKSE gene present in the 5-gene cassettecommon to biosynthetic loci. The forward read of this GST encoded theN-terminal portion of the KS domain of a PKSE gene. The complement ofthe reverse read of this GST encoded the C-terminal portion of the ATdomain of a PKSE gene. This S. kaniharaensis GST was subsequently foundin a genetic locus referred to herein as 135E which includes a 5-genecassette common to all enediyne biosynthetic loci. The arrangement ofthe five genes of the cassette in 135E is shown in FIG. 6. Therefore, S.kaniharaensis has the genomic potential to produce enediyne compound(s).

Table 10 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of the enediyne-specificpolypeptides from the 135E locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 10 135E locusGenBank homology Family #aa Accession, #aa probability identitysimilarity proposed function of GenBank match PKSE 1933 T37056, 2082aa1e−85 282/909 (31.02%) 365/909 (40.15%) multi-domain beta keto-acylsynthase, Streptomyces coelicolor BAB69208.1, 2365aa 3e−84 285/925(30.81%) 366/925 (39.57%) polyketide synthase, Streptomyces avermitilisT30937, 1053aa 2e−69 246/907 (27.12%) 356/907 (39.25%) glycolipidsynthase, Nostoc punctiforme TEBC 154 NP_249659.1, 148aa 2e−07 41/132(31.06%) 63/132 (47.73%) hypothetical protein, Pseudomonas aeruginosaAAD49752.1, 148aa 2e−06 40/132 (30.3%) 62/132 (46.97%) orf1, Pseudomonasaeruginosa NP_214031.1, 128aa 5e−04 35/127 (27.56%) 60/127 (47.24%)hypothetical protein, Aquifex aeolicus UNBL 323 NO HOMOLOG UNBV 655CAC44518.1, 706aa 9e−04 41/135 (30.37%) 59/135 (43.7%) putative secretedesterase, Streptomyces coelicolor UNBU 346 NP_486037.1, 300aa 4e−0952/191 (27.23%) 87/191 (45.55%) hypothetical protein, Nostoc sp.NP_440874.1, 285aa 9e−06 47/197 (23.86%) 89/197 (45.18%) hypotheticalprotein, Synechocystis sp.

The 135E genes listed in Table 10 are arranged as depicted in FIG. 6.The UNBL, UNBV, and UNBU genes span approximately 4 kb and are tandemlyarranged in the order listed. The PKSE and TEBC genes span approximately6.5 kb and are tandemly arranged in the order listed. Thus these fivegenes may constitute two operons. The two putative operons are separatedby approximately 6 kb. Although these two clusters of genes may not betranscriptionally linked to one another, they are still functionallylinked. Therefore, the 135E enediyne-specific cassette is composed offive functionally linked genes and polypeptides, three of which may beexpressed as a one operon and two of which may be expressed as anotheroperon.

Example 10 The 5-Gene Enediyne Cassette is Present in the BiosyntheticLocus of an Unknown Enediyne in Streptomyces citricolor

The genomic sampling method described in Example 1 was applied togenomic DNA from Streptomyces citricolor IFO 13005. This organism wasnot previously described to produce enediyne compounds. Both GSL and CILgenomic DNA libraries of S. citricolor genomic DNA were prepared asdescribed in Example 1.

A total of 1245 GSL clones were sequenced with the forward primer andanalyzed by sequence comparison using the Blast algorithm (Altschul etal., supra). Several secondary metabolism loci were identified andsequenced as described in Example 1. One of these loci (herein referredto as 145B) includes a 5-gene cassette common to all enediynebiosynthetic loci. The arrangement of the five genes of the cassette in145B is shown in FIG. 6. Therefore, S. citricolor has the genomicpotential to produce enediyne compound(s).

Table 11 lists the results of sequence comparison using the Blastalgorithm (Altschul et al., supra) for each of the enediyne-specificpolypeptides from the 145B locus. Homology was determined using theBLASTP algorithm with the default parameters. TABLE 11 145B locusGenBank homology proposed function Family #aa Accession, #aa probabilityidentity similarity of GenBank match PKSE 1958 T37056, 2082aa 4e−88285/929 (30.68%) 378/929 (40.69%) multi-domain beta keto-acyl synthase,Streptomyces coelicolor BAB69208.1, 2365aa 3e−82 284/923 (30.77%)375/923 (40.63%) polyketide synthase, Streptomyces avermitilisAAL01060.1, 2573aa 5e−78 240/855 (28.07%) 354/855 (41.4%)polyunsaturated fatty acid synthase, Photobacterium profundum TEBC 165NP_249659.1, 148aa 2e−07 39/133 (29.32%) 60/133 (45.11%) hypotheticalprotein, Pseudomonas aeruginosa NP_231474.1, 155aa 3e−04 30/127 (23.62%)60/127 (47.24%) hypothetical protein, Vibrio cholerae CAB50777.1, 150aa4e−04 37/135 (27.41%) 58/135 (42.96%) hypothetical protein, Pseudomonasputida UNBL 324 NO HOMOLOG UNBV 659 NP_618575.1, 1881aa 0.003 57/245(23.27%) 85/245 (34.69%) cell surface protein, Methanosarcinaacetivorans UNBU 337 NP_486037.1, 300aa 0.002 62/267 (23.22%) 109/267(40.82%) hypothetical protein, Nostoc sp.

The 145B genes listed in Table 11 are arranged as depicted in FIG. 6.The UNBV, and UNBU genes span approximately 3 kb and are tandemlyarranged in the order listed. The PKSE and TEBC genes span approximately6.5 kb and are tandemly arranged in the order listed. Thus these fourgenes may constitute two operons. The two putative operons are separatedby approximately 9.5 kb that includes the UNBL gene. Although thesegenes may not be transcriptionally linked to one another, they are stillfunctionally linked. Therefore, the 145B enediyne-specific cassette iscomposed of five functionally linked genes and polypeptides, four ofwhich may be expressed as two operons each containing two genes.

Example 11 Analysis of the Polypeptides Encoded by the 5-GeneEnediyne-Specific Cassette

The amino acid sequences of the PKSE, TEBC, UNBL, UNBV, and UNBU proteinfamilies from the ten enediyne biosynthetic loci described above werecompared to one another by multiple sequence alignment using the Clustalalgorithm (Thompson et al., 1994, Nucleic Acids Res. 22(2):4673-4680;Higgins et al., 1996, Methods Enzymol. 266:383-402; Higgins and Sharp(1988) Gene Vol. 73 pp. 237-244). The alignments are shown in FIGS. 8,11, 12, 13, and 14, respectively. Where applicable, conserved residuesor motifs important for the function are highlighted in black andadditional features are indicated.

The PKSE family is a family of polyketide synthases that are involved information of enediyne warhead structures. FIG. 7 summarizesschematically the domain organization of a typical PKSE, showing theposition and relative size of the putative domains based on Markovmodeling of PKS domains: ketosynthase (KS), acyltransferase (AT), acylcarrier protein (ACP), ketoreductase (KR), dehydratase (DH), and4′-phosphopantetheinyl transferase (PPTE) activities. Using thecalicheamicin PKSE as an example, the full-length PKSE protein is 1919amino acids in length. As indicated in FIG. 8 for the calicheamicinPKSE, the KS domain spans positions 3 to 467 of the PKSE; the AT domainspans positions 482 to 905 of the PKSE; the ACP domain spans positions939 to 1009 of the PKSE; a small domain of unknown function ofapproximately 130 amino acids (spanning positions 1025 to 1144 of thePKSE) is present between the ACP and the KR domains; the KR domain spanspositions 1153 to 1414 of the PKSE; the DH domain spans positions 1421to 1563 of the PKSE; a C-terminal 4′-phosphopantetheinyl transferase(PPTE) domain spans positions 1708 to 1914 of the PKSE; a small domainof about 110 amino acids (spanning positions 1591 to 1701 of the PKSE)is present between the DH and the PPTE domains.

The PKSE contains a conserved unusual ACP domain (FIG. 9A). This ACPdomain contains several conserved residues that are also present in thewell-characterized ACP of the actinorhodin type II PKS (PDBid:1AF8 inFIG. 9B). The most important conserved resudue is the serine residue towhich a 4′-phosphopantetheine prosthetic group is covalently attached(corresponding to Ser-42 of 1AF8). In addition to Ser-42, severalsurface-exposed charged residues are conserved, namely Glu-20, Asp-37,and Glu-84 (highlighted in the alignment of FIG. 9A and highlighted andlabeled in the three dimensional structure shown in FIG. 9B). Severalburied uncharged or non-polar residues that may be important instabilizing the overall fold of the ACP domain are also conserved,namely Leu-14, Val-15, Gly-57, Pro-71, Ala-83, and Ala-85 (highlightedin the alignment and three dimensional structure shown in FIG. 9).Interestingly, the conserved serine (Ser-42) is almost alwaysimmediately preceeded by another serine in the ACP domains of PKSEs. Asshown in FIG. 8, nine of the ten PKSE members contain this double serinearrangement, the only exception being that from the 132H locus in whichthe first of the serine is replaced by a threonine. Therefore, PKSEscontain ACP domains with two potential hydroxyl-containing residues inclose proximity to one another. These ACPs may carry two4′-phosphopantetheine prosthetic groups. The positioning of the KR andDH domains after the ACP is unusual among PKSs, but is described in oneof the three PKS-like components of the eicosapentaenoic acid (EPA) anddocosahexaenoic acid (DHA) biosynthetic machinery (Metz et al. (2001)Science Vol. 293 pp. 290-293). The unusual domain organization shared bythe PKSE genes of the invention and the PKS-like synthetase involved insynthesis of polyunsaturated fatty acids suggests that enediyne warheadformation involves intermediates similar to those generated duringassembly of polyunsaturated fatty acids.

The presence of an unusual ACP domain in the PKSE, and the absence ofany obvious 4′-phosphopantetheinyl transferase or holo-ACP synthase(involved in phosphopantetheinyl transfer onto the conserved serine ofthe ACP) common to enediyne biosynthetic loci led us to search for thepresence of a 4′-phosphopantetheinyl transferase. We examined theconserved domains of the PKSE whose functions were unaccounted for aswell as the UNBL, UNBV, and UNBU polypeptides in more detail anddetermined that the PPTE domain was a 4′-phosphopantetheinyltransferase.

The C-terminal domains of the PKSEs from the biosynthetic loci of threeknown enediynes, namely neocarzinostatin (NEOC, aa 1620-1977),calicheamicin (CALI, aa 1562-1919) and macromomycin (MACR, aa1582-1936), were analyzed for their folding using secondary structurepredictions and solvation potential information (Kelley et al., (2000)J. Mol. Biol. Vol. 299 pp. 499-520). Comparison searches using adatabase of known 3-D structures of proteins revealed similaritiesbetween the C-terminal domains of the PKSEs and Sfp, the4′-phosphopantetheinyl transferase from the Bacillus subtilis surfactinbiosynthetic locus (Reuter et al. (1999) EMBO Vol. 18 pp. 6823-6831).The alignment shown in FIG. 10A indicates the predicted secondarystructures of all three C-terminal PKSE domains (PPTE domains) alongwith the X-ray crystallography-determined secondary structure of Sfp(PDB id: 1QR0). Alpha-helices are indicated by rectangles and β-sheetsby arrows.

An overall conservation of secondary structure over the entire length ofthe proteins is evident. All major structural constituents of Sfp,namely α-helices α1-α5 and β-sheets β2-β4 and β8 are also present inPPTE domains. Similar to Sfp, the PPTE domains are predicted to have anintramolecular 2-fold pseudosymmetry.

The loop formed between α5 and β7 in Sfp is not present in the PPTEdomains. It is believed that this region of Sfp is in part responsiblefor ACP recognition and contributes to the broad substrate specificityobserved for this enzyme. The size of this loop appears to vary amongphosphopantetheinyl transferases, as the EntD enzyme, which exhibits agreater ACP substrate specificity than Sfp, has a region between α5 andβ7 structures shorter than that of Sfp but longer than that found in thePPTE domains. The short α5/β7 loop region found in the PPTE domains mayreflect the need for a specific interaction with the rather unusual ACPdomain found in the PKSE enzymes. Residues conserved in allphosphopantetheinyl transferases and shown in Sfp to make contacts withthe CoA substrate and Mg⁺⁺ cofactor are also conserved in the PPTEdomains (highlighted in FIG. 10A).

Referring to FIG. 10B, Sfp residues Lys-28 and Lys-31 make salt bridgeswith the 3′-phosphate of CoA and are not found in the PPTE domains;however, a similar interaction could be provided by the correspondingconserved residue Arg-26. Sfp Thr-44 makes a hydrogen bond and His-90 asalt bridge with the 3′-phosphate of CoA; similar hydrogen bondingpotential is provided by the conserved serine found at the correspondingposition 44 of the PPTE domains, while the histidine 90 residue isabsolutely conserved in all three PPTE domains.

Sfp amino acid residues 73-76 hold in place the adenine base of CoA. Themain chain carbonyl of Tyr-73 forms a hydrogen bond with the adenineamino group and residues Gly-74, Lys-75 and Pro-76 hold firmly in placethe adenine ring. In the PPTE domains, a conserved aspartic acid thatmay form a salt bridge with the adenine amino group is substituted forTyr-73 and a conserved arginine residue is substituted for Lys-75. Theremaining two residues, Gly-74 and Pro-76, are also found in the PPTEdomains.

Sfp residues Ser-89 and His-90 interact via hydrogen bonding and saltbridging with the α-phosphate of the CoA substrate. Similarly, Lys-155in helix α5 interacts with the CoA α-phosphate. The His-90 and Lys-155residues are highly conserved in the PPTE domains whereas Ser-89 isfound only in the neocarzinostatin PPTE domain.

Sfp residues Asp-107, Glu-109 in the β4 sheet and Glu-151 in the α5helix participate in the complexation of a metal ion (presumably Mg⁺⁺)together with the α and β phosphates of the CoA pyrophosphate and awater molecule. All three residues are also conserved in PPTE domains.Importantly, Asp-107 was altered by mutagenesis in Sfp and shown to becritical for catalytic activity but not for CoA binding of the proteinsuggesting the Mg⁺⁺ ion is important for catalysis (Quadri et al., 1998,Biochemistry, Vol. 37, 1585-1595).

In the Sfp protein, residue Glu-127 salt-bridges the amino group ofLys-150. In the PPTE domains, a Glu/Asp residue is found at thecorresponding position 127, whereas Lys-150 is not conserved. SinceGlu-127 is highly conserved in the PPTE domains, it is conceivable thatthe role of Lys-150 is served by other basic residues in the vicinity,namely the conserved arginine at the corresponding position 145. ResidueTrp-147, conserved in all phosphopantetheinyl transferases and shown tobe critical for catalytic activity, is also present in all three PPTEdomains (Quadri et al., 1998, Biochemistry, Vol. 37, 1585-1595).

The presence of a phosphopantetheinyl domain (PPTE) in the C-terminalpart of the PKSE enediyne warhead PKS is reminiscent of the4′-phosphopantetheinyl domain found in the yeast fatty acid synthase(FAS) complex, where it resides in the C-terminal region of the FAS αsubunit. FAS is capable of auto-pantetheinylation resulting in apost-translational autoactivation of this enzyme (Fichtlscherer et al.,2000, Eur. J. Biochem., Vol. 267, 2666-2671). In a similar manner, thePKSE warhead PKSs are likely to be capable of auto-pantetheinylation andactivation of their ACP domains before proceeding to the iterativesynthesis of the polyunsaturated polyketide intermediate forming theenediyne core.

The ACP and KR domains of the PKSEs are separated by approximately 130amino acids. The presence of a considerable number of invariableresidues within this stretch of amino acids suggests that the putativedomain formed by these 130 amino acids has a functional role. Theputative domain may serve a structural role, for example as aprotein-protein interaction domain or it may form a cleft adjacent tothe ACP that acts as a “chain length factor” for the growing polyketidechain. A search of NCBI's Conserved Domain Database with ReversePosition Specific BLAST revealed several short stretches of homology toproteins that bind substrates such as ATP, AMP, NAD(P), as well asfolates and double stranded RNA (adenosine deaminase). Thus, theputative domain may adopt a structure accommodating an adenosine oradenosine-like structure and serve as a cofactor-binding site.Alternatively, the domain might interact with the adenosine moiety ofcoenzyme A (CoA). As such, the physical proximity of the CoA to the ACPdomain may facilitate the phosphopantetheinylation of the ACP. Yetanother possibility is that a molecule of CoA is noncovalently-bound tothe putative domain downstream of the ACP via its adenosine moiety andits phosphopantetheinyl tail protrudes out from the enzyme, as would thephosphopantetheinyl tail on the holo-ACP. Alternatively, the PPTE domaincan carry a molecule of noncovalently-bound CoA. Thus, it is expectedthat KS carries out several iterations of condensation reactionsinvolving the transfer of an acetyl group from an acetyl-ACP-thioesterto a growing acyl-CoA chain that is non-covalently bound to the enzyme.The proposed scenario explains the presence of the TEBC, an acyl-CoAthioesterase rather than a “conventional” PKS-type thioesterase: thefull-length polyketide chain generated by the PKSE is not tethered tothe holo-ACP, but rather to a non-covalently bound CoA and the TEBChydrolyzes the thioester bond of a polyketide-CoA to release thefull-length polyketide and CoA. A CoA-activated thioester may render thepolyketide more accessible to auxiliary enzymes involved in cyclizationand acetylenation prior to or concomitant to hydrolytic release by TEBC.

FIG. 11 is a Clustal amino acid alignment showing the relationshipbetween the TEBC family of proteins and the enzyme 4-hydroxybenzoyl-CoAthioesterase (1 BVQ) of Pseudomonas sp. Strain CBS-3 for which thecrystal structure has been previously determined (Benning et al. (1998)J. Biol. Chem. Vol. 273 pp. 33572-33579). The black bars highlight thethree regions of conservation believed to play important roles in thecatalysis for 4-hydroxybenzoyl-CoA thioesterase. Homology between theTEBC family of proteins and 1 BVQ is concentrated in these threehighlighted regions.

FIG. 12 is a Clustal amino acid alignment of the UNBL family ofproteins. The UNBL family of proteins represents a novel group ofconserved proteins that are unique to enediyne biosynthetic loci. TheUNBL proteins are rich in basic residues and contain several conservedor invariant histidine residues. Besides the PKSE and TEBC proteins, theUNBL proteins are the only other proteins predicted by the PSORT program(Nakai et al., (1999) Trends Biochem. Sci. Vol. 24 pp. 34-36) to becytosolic that are encoded by the enediyne warhead gene cassette andthus represent the best candidates for the acetylenase activity that isrequired to introduce triple bonds into the warhead structure.

FIG. 13 is a Clustal amino acid alignment of the UNBV family ofproteins. PSORT analysis of the UNBV family of proteins predicts thatthey are secreted proteins. The approximate position of the putativecleavable N-terminal signal sequence is indicated above the alignment.The UNBV proteins display considerable amino acid conservation but donot have any known homologue. Thus, the UNBV family of proteinsrepresents a novel group of conserved proteins of unknown function thatare unique to enediyne biosynthetic loci.

FIG. 14 is a Clustal amino acid alignment of the UNBU family ofproteins. PSORT analysis of the UNBU family of proteins predicts thatthey are integral membrane proteins with seven or eight putativemembrane-spanning alpha helices (indicated by dashes in FIG. 14). TheUNBU proteins display considerable amino acid conservation but do nothave any known homologue. The UNBU family of proteins represents a novelgroup of conserved proteins that are unique to enediyne biosyntheticloci.

UNBU is likely involved in transport of the enediynes across the cellmembrane. UNBU may also contribute, in part, to the biochemistryinvolved in the completion of the warhead. In the case of chromoproteinenediynes, the apoprotein carries its own cleavable N-terminal signalsequence and is probably exported independently of the chromoprotein bythe general protein secretion machinery. Formation of the bioactivewarhead, export, and binding of the chromophore and protein componentmust occur in and around the cell membrane to minimize damage to theproducer and to maximize the stability of the natural product. UNBV ispredicted to be an extracellular protein. UNBV may finalize or stabilizethe warhead structure. UNBV may act in close association with theextracellularly exposed portion(s) of UNBU.

To date, we have sequenced over ten enediyne biosynthetic loci thatcontain the 5-gene cassette made up of PKSE, TEBC, UNBL, UNBV, and UNBUgenes. In all cases, the PKSE and TEBC genes are adjacent to one anotherand the TEBC gene is always downstream of the PKSE gene. Moreover, thesetwo genes are usually, if not always, translationally coupled. Theseobservations suggest that the expression of the PKSE and TEBC genes istightly coordinated and that their gene products, i.e., polypeptides,act together. Likewise, the UNBV and UNBU genes are always adjacent toone another and the UNBU gene is always downstream of the UNBV gene.Moreover, these two genes are usually, if not always, translationallycoupled. These observations suggest that the expression of the UNBV andUNBU genes is tightly coordinated and that their gene products, i.e.,polypeptides, act together.

Example 12 Common Mechanism for the Biosynthesis of Enediyne Warheads

Without intending to be limited to any particular biosynthetic scheme ormechanism of action, the genes and proteins of the present invention canexplain formation of enediyne warheads in both chromoprotein enediynesand non-chromoprotein enediynes.

The PKSE is proposed to generate a highly conjugated polyunsaturatedhepta/octaketide intermediate in a manner analogous to the action ofpolyunsaturated fatty acid synthases (PUFAs). The polyunsaturated fattyacyl intermediate is then modified by tailoring enzymes involving one ormore of UNBL, UNBU and UNBV to introduce the acetylene bonds and formthe ring structure(s). The conserved auxiliary proteins UNBL, UNBU andUNBV are expected to be involved in modulating iterations performed bythe PKSE, or in subsequent transformations to produce the enediyne corein a manner analogous to action of lovastatin monaketide synthase, afungal iterative type I polyketide synthase that is able to performdifferent oxidative/reductive chemistry at each iteration with the aidof at least one auxiliary protein (Kennedy et al., 1999, Science Vol.284 pp. 1368-1372).

The acetate enrichment pattern of the enediyne moiety of esperamicin anddynemicin suggest that both are derived from an intactheptaketide/octaketide. There has been suggestion that esperamicin anddynemicin may share a common precursor (Lam et. al., J. Am. Chem. Soc.1993, Vol. 115 pp. 12340). However, in the case of neocarzinostatin,representative of other chromoprotein enediynes, incorporation studiesinvestigating carbon-carbon connectivities revealing that the finalenediyne core contains uncoupled acetate atoms (Hensens et al., 1989JACS, Vol. 111, pp. 3295-3299), and other studies regardingpolyacetylene biosynthesis (Hensens et. al., supra), suggest that thechromoprotein enediyne precursors are distinct from those of thenon-chromoprotein enediynes. Thus, prior art studies regarding formationof the enediyne core teach away from the present invention that genesand proteins common to both chromoprotein enediynes andnon-chromoprotein enediynes are responsible for formation of the warheadin both classes of enediynes.

We propose that skeletal rearrangements may account for the distinctchromoprotein/nonchromoprotein enediyne labeling patterns. For instance,thermal electrocyclic rearrangement of an intermediate cyclobutene to a1,3 diene could result in an isotopic labeling pattern consistent withthat which has been reported.

Accordingly, the warhead precursor in the formation of neocarzinostatincould be a heptaketide, similar to that proposed for the other classesof enediynes. Since calicheamicin and esperimicin do not contain anyuncoupled acetates, the common unsaturated polyketidic precursor mustrearrange differently from the chromoprotein class. However, theproposed biosynthetic scheme is consistent with one aspect of thepresent invention, namely that warhead formation in all enediynesinvolves common genes, proteins and common precursors.

Example 13 Heterologous Expression of Genes and Proteins of theCalicheamicin Enediyne Cassette

Escherichia coli was used as a general host for routine subcloning.Streptomyces lividans TK24 was used as a heterologous expression host.The plasmid pECO1202 was derived from plasmid pANT1202 (Desanti, C. L.2000. The molecular biology of the Streptomyces snp Locus, 262 pp., Ph.Ddissertation, Ohio State Univ., Columbus, Ohio) by deleting the KpnIsite in the multi-cloning site (MCS). pECO1202RBS contains a DNAsequence encoding a putative ribosome-binding site (AGGAG) introducedjust upstream of the ClaI site located in the MCS of pECO1202.

E. coli strains carrying plasmids were grown in Luria-Bertani (LB)medium and were selected with appropriate antibiotics. S. lividans TK24strains were grown on R2YE medium. (Kieser, T. et al., PracticalStreptomyces Genetics, The John Innes Foundation, Norwich, UnitedKingdom, 2000).

Preparation of S. lividans TK24 protoplasts was carried out using thestandard protocols. (Kieser et al., supra). Polyethylene glycol-inducedprotoplast transformation was carried out with 1 μg DNA pertransformation. After protoplast regeneration on R5 agar medium for 16 hat 30° C., transformants were selected by overlaying each R5 plate with50 μg/ml apramycin solutions. Transformants were grown in 50 ml flaskscontaining R2YE medium plus apramycin for seven days.

SDS-PAGE and Western-blotting were carried out by standard procedures(Sambrook, J. et al. 1989. Molecular cloning: a laboratory manual,2^(nd) ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.).Penta-His antibody was obtained from Qiagen. Western blots wereperformed using the ECL detection kit from Amersham Pharmacia biotechusing the manufacturer's suggested protocols. One milliliter ofseven-day S. lividans culture was centrifuged and mycelium resuspendedin cold extraction buffer (0.1M Tris-HCl, pH 7.6, 10 mM MgCl₂ and 1 mMPMSF). The mycelium was sonicated 4×20 sec on ice with 1 min intervalsto release soluble protein. After 10 min centrifugation at 20,000g, thesupernatant and pellet fractions were diluted with sample buffer andsubjected to SDS-PAGE and Western-blotting analysis.

DNA manipulations used in construction of expression plamsids werecarried out using standard methods (Sambrook, J. et al., supra). Theplasmid pECO1202 was used as the parent plasmid. Cosmid 061CR, carryingthe calicheamicin biosynthetic gene locus was digested with MfeI, andthe restriction fragments were made blunt ended by treatment with theKlenow fragment of DNA polymerase 1. Upon additional digestion withBg/II after phenol extraction and ethanol precipitation, the resulting11.5 kb blunt-ended, Bg/II fragment was gel purified and cloned intopECO1202 (previously digested with EcoRI, made blunt ended by treatmentwith Klenow fragment of polymerase I, then digested with BamHI), toyield pECO1202-CALI-1, as shown in FIG. 15.

PCR was carried out on a PTC-100 programmable thermal controller (MJresearch) with Pfu polymerase and buffer from Stratagene. A typical PCRmixture consisted of 10 ng of template DNA, 20 μM dNTPs, 5% dimethylsulfoxide, 2U of Pfu polymerase, 1 μM primers, and 1× buffer in a finalvolume of 50 μl. The PCR temperature program was the following: initialdenaturation at 94° C. for 2 min, 30 cycles of 45 sec at 94° C., 1 minat 55° C., and 2 min at 72° C., followed by an additional 7 min at 72°C. A PCR product amplified by primer 1402,5′-GAGTTGTATCGATGAGCAGGATCGCCGTCGTCGGC-3′ [containing Cla I site(italic) and the start codon of PKSE gene (bold)], and primer 1420,5′GTAGCCGGCCGCCTCCGGCC (corresponding to the nucleotide sequence 940 to959 bp of PKSE), was digested with ClaI and NheI and gel purified. Thisfragment was then cloned into ClaI, NheI digested pECO1202-CALI-1 toyield pECO1202-CALI-5 (FIG. 16).

PCR products were amplified by primer 1421, 5′-GACCTGCCGTACACCGTCTCC-3′(corresponding to the nucleotide sequence 5367 to 5387 bp of PKSE), andprimer 1403,5′-CCCAAGCTTCAGTGGTGGTGGTGGTGGTGCCCCTGCCCCACCGTGGCCGAC-3[containing aHis Tag (underlined), HindIII site (italic) and stop codon of TEBC(bold)], or primer 1500, 5′-CCCAAGCTTCACCCCTGCCCCACCGTGGCCGAC-3′(containing HindIII site (italic) and stop codon (bold) of TEBC). ThesePCR products were digested with HindIII and PstI, gel purified, and thencloned into HindIII, PstI digested pECO1205 to yield pECO1202-CALI-2(with HisTag) and pECO1202-CALI-3 (without HisTag), respectively (FIG.16).

The ClaI and HindIII fragments from pECO1202-CALI-2 and pECO1202-CALI-3were cloned into pECO1202RBS to yield pECO1202-CALI-6 (with HisTag) andpECO1202-CALI-7 (without HisTag), respectively, as shown in FIG. 16.

Six transformants of S. lividans TK24 harboring pECO1202-CALI-2 wereanalyzed for expression of the His-tagged TEBC protein. Referring toFIG. 17, lane M provides molecular weight markers; lanes 1 to 6represent crude extracts of independent transformants of S. lividansTK24 harboring pECO1202-CALI-2; lane 7 represents a crude extract of S.lividans TK24 harboring pECO1202-CALI-4; and lane 8 represents a crudeextract of S. lividans TK24 harboring pECO1202 (control). TEBC proteinexpression was detected in four pECO1202-CALI-2 transformants by Westernblotting using an antibody that recognizes the His-tag (lanes 2, 3, 5,6). TEBC protein expression was also observed in the transformant of S.lividans TK24 harboring pECO1202-CALI-4 (lane 7).

As shown in FIG. 12, the TEBC protein was expressed as a soluble proteinin S. lividans although the pellet fraction also contains TEBC protein,perhaps reflecting insoluble protein or incomplete lysis of S. lividansby the sonication procedure used. FIG. 12 provides an analysis ofHis-tagged TEBC protein derived from recombinant S. lividans TK24 byimmunoblotting. The soluble and insoluble protein fractions of S.lividans transformants were separated by 12% SDS-polyacrylamide gelelectrophoresis, blotted to PVDF membrane, and detected detection withthe Penta-His antibody. Referring to FIG. 12, lane M provides molecularweight markers; lane 1 to 6 represent soluble (S) and pellet (P) proteinfractions of independent transformants of S. lividans TK24 harboringpECO1202-CALI-2; lane C represents protein fractions of S. lividans TK24harboring pECO1202 (control).

Example 14 Disruption of the PKSE Gene Abolishes Production of Enediyne

To confirm that the PKSE is critical to the biosynthesis of enediynes,the PKSE gene of the calicheamicin producer, M. echinospora, wasdisrupted by introduction of an apramycin selectable marker as follows.M. echinospora was grown with a 1:100 fresh inoculum in 50 mL MS medium(Kieser et al., supra) supplemented with 5% PEG 8000 and 5 mM MgCl₂ for24-36 h and 6 h prior to harvest, 0.5% glycine was added. The digest ofthe cell wall was accomplished via published procedures with theexception that 5 mg mL⁻¹ lysozyme and 2000 U mutanolysin were used.Under these conditions, protoplast formation was complete within 30-60min after which the mixture was filtered twice through cotton wool.Transformation was accomplished via typical methodology (Kieser et al.,supra) with a 1:1 mixture of T-buffer and PEG 2000 containing up to 10μg of alkaline denatured DNA per transformation. The protoplasts werethen plated on R2YE plates supplemented with 10 mg L⁻¹ CoCl₂ andsubmitted to antibiotic pressure (70 μg mL⁻¹ apramycin) after 3-4 days.To date, all attempts to use methods other than protoplast chemicaltransformation (e.g. phage transduction, conjugation andelectroporation) have failed to introduce DNA into M. echinospora. Lowtransformation efficiencies were observed in all calicheamicin-producingMicromonospora strains tested, including those developed from strainimprovement efforts. In comparison to other actinomycetes, M.echinospora protoplast regeneration was found to be slow (˜4 weeks).Moreover, integration into the locus requires homologous fragmentsexceeding 3 kb in size as constructs containing PKSE fragments (or othercalicheamicin gene fragments) smaller than 3 kb all failed to integrateinto the chromosome (data not shown).

Nine independent apramycin-resistant PKSE disruption clones wereobtained. All nine isolates mapped consistently with the expected PKSEgene disruption both by PCR fragment amplification and by Southernhybridization (data not shown). All nine PKSE disruption mutants and twoparental controls were subsequently tested in parallel for calicheamicinproduction. Extracts from these strains were prepared as follows. FreshM. echinospora cells grown in R2YE were inoculated 1:100 in 10 mL mediumE (Kieser et al., supra) in stoppered 25 ml glass tubes containing a 4cm stainless coil spring for better aeration and incubated on an orbitalshaker with 230 rpm at 28° C. for one to three weeks. A 600 μl aliquotwas removed at various time points, extracted with an equal volume ofEtOAc and centrifuged at 10000×g for 5 min in a benchtop centrifuge. Thesupernatant was concentrated to dryness, the pellet redissolved in 200μl acetonitrile, centrifuged again and the supernatant removed,concentrated to dryness and the residual material finally dissolved in10 μl acetonitrile. One μl of this solution was utilized for thebioassays and the remaining 8 μl aliquot was utilized for analysis byHPLC (Ultrasphere-ODS chromatography, 5 μm, 4.6 mm×250 mm, 55:45CH₃CN-0.2 NH₄OAc, pH 6.0, 1.0 mL min⁻¹, 280 nm detection). A typical M.echinospora fermentation contains a mixture of calicheamicins that areresolved by HPLC-_(γ1) ^(I) (retention time-7 min, ˜60%), δ₁ ^(I)(retention time-5.7 min, ˜30%), and α₃ ^(I) (retention time-3.8 min,˜10%)—and all of these calicheamicin components contribute to bioassayactivities. The best production was found to occur during late log orearly stationary phase growth. The estimate of calicheamicin productionby parental M. echinospora is 0.78-0.85 mg mL⁻¹. Extracts were analyzedby i) the biological induction assay, a modified prophage inductionassay used in the original discovery of the calicheamicins (Greensteinet al., (1986) Antimicrob. Agents Chemotherap. Vol. 29, 861); ii) themolecular break light assay, a DNA-cleavage assay based uponintramolecular fluorescence quenching optimized for DNA-cleavage byenediynes (in which fM calicheamicin concentrations are detectable)(Biggins et al. (2000) Proc. Natl. Acad. Sci. USA Vol. 97, 13537); andiii) high-performance liquid chromatography (HPLC) (described above). Asexpected, all three methods revealed that the parental M. echinosporafermentations produced 0.5-0.8 mg L⁻¹. In contrast, the PKSE genedisruption mutant strains were both devoid of any calicheamicin, knowncalicheamicin derivatives and/or enediyne activity by all three methodsof detection. The elimination of calicheamicin production brought aboutby disruption of the PKSE gene indicates that it provides an essentialactivity for biosynthesis of calicheamicin. Based on the presence of thePKSE in all enediyne biosynthetic loci sequenced to date and on theiroverall conservation, it is expected that PKSEs fulfill the same,essential function in the biosynthesis of all enediyne structures.

The present invention is not to be limited in scope by the specificembodiments described herein. Indeed, various modifications of theinvention in addition to those described herein will become apparent tothose skilled in the art from the foregoing description and theaccompanying figures. Such modifications are intended to fall within thescope of the appended claims.

It is further to be understood that all sizes and all molecular weightor mass values are approximate, and are provided for description.

Some open reading frames listed herein initiate with non-standardinitiation codons (e.g. GTG—Valine or TTG—Leucine) rather than thestandard initiation codon ATG, namely SEQ ID NOS: 2, 8, 16, 28, 30, 32,38, 40, 42, 48, 54, 56, 70, 74, 76, 78, 80, 82, 84, 86, 88, 92, 98, 100.All ORFs are listed with M, V or L amino acids at the amino-terminalposition to indicate the specificity of the first codon of the ORF. Itis expected, however, that in all cases the biosynthesized protein willcontain a methionine residue, and more specifically a formylmethionineresidue, at the amino terminal position, in keeping with the widelyaccepted principle that protein synthesis in bacteria initiates withmethionine (formylmethionine) even when the encoding gene specifies anon-standard initiation codon (e.g. Stryer, Biochemistry 3^(rd) edition,1998, W.H. Freeman and Co., New York, pp. 752-754).

Patents, patent publications, procedures and publications citedthroughout this application are incorporated herein in their entiretyfor all purposes.

1-28. (canceled)
 29. An enediyne polyketide catalytic complex comprisingan enediyne polyketide synthase (PKSE) and a thioesterase (TEBC),wherein said PKSE comprises a C-terminal phosphopantetheinyl transferase(PPTE) domain.
 30. The enediyne polyketide catalytic complex of claim29, wherein said TEBC is adjacent to the PKSE.
 31. The enediynepolyketide catalytic complex of claim 29, wherein said enediynecatalytic complex encoded by a nucleic acid is isolated from abacterium.
 32. The enediyne polyketide catalytic complex of claim 31,wherein said bacterium is Streptomyces macromyceticus.
 33. The enediynepolyketide catalytic complex of claim 31, wherein said bacterium isMicromonospora echinospora subsp. calichensis.
 34. The enediynepolyketide catalytic complex of claim 31, wherein said bacterium isStreptomyces ghanaensis.
 35. The enediyne polyketide catalytic complexof claim 31, wherein said bacterium is Streptomyces carzinostaticussubsp. neocarzinostaticus.
 36. The enediyne polyketide catalytic complexof claim 31, wherein said bacterium is Amycolatopsis orientalis.
 37. Theenediyne polyketide catalytic complex of claim 31, wherein saidbacterium is Kitasatosporia sp.
 38. The enediyne polyketide catalyticcomplex of claim 31, wherein said bacterium is Micromonosporamegalomicea.
 39. The enediyne polyketide catalytic complex of claim 31,wherein said bacterium is Saccharothrix aerocolonigenes.
 40. Theenediyne polyketide catalytic complex of claim 31, wherein saidbacterium is Streptomyces kanihareansis.
 41. The enediyne polyketidecatalytic complex of claim 31, wherein said bacterium is Streptomycescitricolor.
 42. The enediyne polyketide catalytic complex of claim 29,wherein said PKSE is selected from the group consisting of: SEQ ID NO:1, SEQ ID NO: 13, SEQ ID NO: 23, SEQ ID NO: 33, SEQ ID NO: 43, SEQ IDNO: 53, SEQ ID NO: 63, SEQ ID NO: 73, SEQ ID NO: 83 and SEQ ID NO: 93.43. The enediyne polyketide catalytic complex of claim 29, wherein saidTEBC is selected from the group consisting of: SEQ ID NO: 3, SEQ ID NO:5, SEQ ID NO: 15, SEQ ID NO: 25, SEQ ID NO: 35, SEQ ID NO: 45, SEQ IDNO: 55, SEQ ID NO: 65, SEQ ID NO: 75, SEQ ID NO: 85 and SEQ ID NO:95.44. An enediyne polyketide catalytic complex for the biosynthesis of thewarhead structure of an enediyne compound, wherein said polyketidecatalytic complex comprises an enediyne polyketide synthase (PKSE) and athioesterase (TEBC), wherein said PKSE comprises a C-terminalphosphopantetheinyl transferase (PPTE) domain.
 45. The enediynepolyketide catalytic complex obtained from cosmid 020CN deposited withthe International Depositary Authority of Canada having accession no.IDAC 030402-1.
 46. The enediyne polyketide catalytic complex obtainedfrom cosmid 061CR deposited with the International Depositary Authorityof Canada having accession no. IDAC 030402-2.
 47. A method of preparingan enediyne warhead structure, comprising transforming a host cell witha nucleic acid encoding an enediyne polyketide catalytic complex ofclaim 29, said nucleic acid being operably linked to a promoter,culturing said host cell under conditions such that an enediynepolyketide catalytic complex is produced and catalyzes the synthesis ofsaid enediyne compound.
 48. The method of claim 47, wherein said hostcell is a bacterium.
 49. The method of claim 47, wherein said PKSE isselected from the group consisting of: SEQ ID NO: 1, SEQ ID NO: 13, SEQID NO: 23, SEQ ID NO: 33, SEQ ID NO: 43, SEQ ID NO: 53, SEQ ID NO: 63,SEQ ID NO: 73, SEQ ID NO: 83 and SEQ ID NO:
 93. 50. The method of claim47, wherein said TEBC is selected from the group consisting of: SEQ IDNO: 3, SEQ ID NO: 5, SEQ ID NO: 15, SEQ ID NO: 25, SEQ ID NO: 35, SEQ IDNO: 45, SEQ ID NO: 55, SEQ ID NO: 65, SEQ ID NO: 75, SEQ ID NO: 85 andSEQ ID NO:95.